[PATCH v2 1/9] Document signaling for min, max and ltgt operations

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH v2 1/9] Document signaling for min, max and ltgt operations
  2019-08-22 13:47 [PATCH v2 0/9] S/390: Use signaling FP comparison instructions Ilya Leoshkevich
  2019-08-22 13:47 ` [PATCH v2 3/9] Introduce can_vector_compare_p function Ilya Leoshkevich
  2019-08-22 13:47 ` [PATCH v2 2/9] hash_traits: split pointer_hash_mark from pointer_hash Ilya Leoshkevich
@ 2019-08-22 13:47 ` Ilya Leoshkevich
  2019-08-22 13:48 ` [PATCH v2 4/9] S/390: Do not use signaling vector comparisons on z13 Ilya Leoshkevich
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-08-22 13:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, segher, Ilya Leoshkevich

Currently it's not clear whether min, max and ltgt should raise floating
point exceptions when dealing with qNaNs.

Right now a lot of code assumes that LTGT is signaling: in particular,
with -fno-finite-math-only, which is the default, it's generated for
the signaling ((x < y) || (x > y)).

The behavior of MIN/MAX is (intentionally?) left unspecified, according
to https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=263751
("Unconditionally use MAX_EXPR/MIN_EXPR for MAX/MIN intrinsics").

So document the status quo.

gcc/ChangeLog:

2019-08-09  Ilya Leoshkevich  <iii@linux.ibm.com>

	PR target/91323
	* doc/generic.texi (LTGT_EXPR): Restore the original wording
	regarding floating point exceptions.
	(MIN_EXPR, MAX_EXPR): Document.
	* doc/md.texi (smin, smax): Add a clause regarding floating
	point exceptions.
	* doc/rtl.texi (smin, smax): Add a clause regarding floating
	point exceptions.
---
 gcc/doc/generic.texi | 16 +++++++++++++---
 gcc/doc/md.texi      |  3 ++-
 gcc/doc/rtl.texi     |  3 ++-
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index 8901d5f357e..d5ae20bd461 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -1331,6 +1331,8 @@ the byte offset of the field, but should not be used directly; call
 @tindex UNGE_EXPR
 @tindex UNEQ_EXPR
 @tindex LTGT_EXPR
+@tindex MIN_EXPR
+@tindex MAX_EXPR
 @tindex MODIFY_EXPR
 @tindex INIT_EXPR
 @tindex COMPOUND_EXPR
@@ -1602,13 +1604,21 @@ These operations take two floating point operands and determine whether
 the operands are unordered or are less than, less than or equal to,
 greater than, greater than or equal to, or equal respectively.  For
 example, @code{UNLT_EXPR} returns true if either operand is an IEEE
-NaN or the first operand is less than the second.  With the possible
-exception of @code{LTGT_EXPR}, all of these operations are guaranteed
-not to generate a floating point exception.  The result
+NaN or the first operand is less than the second.  Only @code{LTGT_EXPR}
+is expected to raise an invalid floating-point-operation trap when the
+outcome is unordered.  All other operations are guaranteed not to raise
+a floating point exception.  The result
 type of these expressions will always be of integral or boolean type.
 These operations return the result type's zero value for false,
 and the result type's one value for true.
 
+@item MIN_EXPR
+@itemx MAX_EXPR
+These nodes represent minimum and maximum operations.  When used with
+floating point, if both operands are zeros, or if either operand is
+@code{NaN}, then it is unspecified which of the two operands is returned
+as the result and whether or not a floating point exception is raised.
+
 @item MODIFY_EXPR
 These nodes represent assignment.  The left-hand side is the first
 operand; the right-hand side is the second operand.  The left-hand side
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 7751984bf5f..74f8ec84974 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5353,7 +5353,8 @@ in the rtl as
 @item @samp{smin@var{m}3}, @samp{smax@var{m}3}
 Signed minimum and maximum operations.  When used with floating point,
 if both operands are zeros, or if either operand is @code{NaN}, then
-it is unspecified which of the two operands is returned as the result.
+it is unspecified which of the two operands is returned as the result
+and whether or not a floating point exception is raised.
 
 @cindex @code{fmin@var{m}3} instruction pattern
 @cindex @code{fmax@var{m}3} instruction pattern
diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
index 0814b66a486..e0628da893d 100644
--- a/gcc/doc/rtl.texi
+++ b/gcc/doc/rtl.texi
@@ -2596,7 +2596,8 @@ Represents the smaller (for @code{smin}) or larger (for @code{smax}) of
 @var{x} and @var{y}, interpreted as signed values in mode @var{m}.
 When used with floating point, if both operands are zeros, or if either
 operand is @code{NaN}, then it is unspecified which of the two operands
-is returned as the result.
+is returned as the result and whether or not a floating point exception
+is raised.
 
 @findex umin
 @findex umax
-- 
2.21.0

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
@ 2019-08-22 13:47 Ilya Leoshkevich
  2019-08-22 13:47 ` [PATCH v2 3/9] Introduce can_vector_compare_p function Ilya Leoshkevich
                   ` (9 more replies)
  0 siblings, 10 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-08-22 13:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, segher, Ilya Leoshkevich

Bootstrap and regtest running on x86_64-redhat-linux and
s390x-redhat-linux.

This patch series adds signaling FP comparison support (both scalar and
vector) to s390 backend.

Patch 1 documents the current behavior of MIN, MAX and LTGT operations
with respect to signaling.

Patches 2-4 make it possible to query supported vcond rtxes and make
use of that for z13.

Patches 5-7 are preparation cleanups.

Patch 8 is an actual implementation.

Path 9 contains new tests, that make sure autovectorized comparisons use
proper instructions.

Ilya Leoshkevich (9):
  Document signaling for min, max and ltgt operations
  hash_traits: split pointer_hash_mark from pointer_hash
  Introduce can_vector_compare_p function
  S/390: Do not use signaling vector comparisons on z13
  S/390: Implement vcond expander for V1TI,V1TF
  S/390: Remove code duplication in vec_unordered<mode>
  S/390: Remove code duplication in vec_* comparison expanders
  S/390: Use signaling FP comparison instructions
  S/390: Test signaling FP comparison instructions

v1->v2:
Improve wording in documentation commit message.
Replace hook with optabs query.
Add signaling eq test.

 gcc/Makefile.in                               |   2 +-
 gcc/config/s390/2827.md                       |  14 +-
 gcc/config/s390/2964.md                       |  13 +-
 gcc/config/s390/3906.md                       |  17 +-
 gcc/config/s390/8561.md                       |  19 +-
 gcc/config/s390/s390-builtins.def             |  16 +-
 gcc/config/s390/s390-modes.def                |   8 +
 gcc/config/s390/s390.c                        |  38 ++-
 gcc/config/s390/s390.md                       |  14 +
 gcc/config/s390/vector.md                     | 283 ++++++++++++------
 gcc/cp/decl2.c                                |  14 +-
 gcc/doc/generic.texi                          |  16 +-
 gcc/doc/md.texi                               |   3 +-
 gcc/doc/rtl.texi                              |   3 +-
 gcc/hash-traits.h                             |  74 +++--
 gcc/ipa-prop.c                                |  47 +--
 gcc/optabs-tree.c                             |  11 +-
 gcc/optabs.c                                  |  79 +++++
 gcc/optabs.h                                  |  15 +
 gcc/testsuite/gcc.target/s390/s390.exp        |   8 +
 .../gcc.target/s390/vector/vec-scalar-cmp-1.c |   8 +-
 .../s390/zvector/autovec-double-quiet-eq.c    |   8 +
 .../s390/zvector/autovec-double-quiet-ge.c    |   8 +
 .../s390/zvector/autovec-double-quiet-gt.c    |   8 +
 .../s390/zvector/autovec-double-quiet-le.c    |   8 +
 .../s390/zvector/autovec-double-quiet-lt.c    |   8 +
 .../zvector/autovec-double-quiet-ordered.c    |  10 +
 .../s390/zvector/autovec-double-quiet-uneq.c  |  10 +
 .../zvector/autovec-double-quiet-unordered.c  |  11 +
 .../autovec-double-signaling-eq-z13-finite.c  |  10 +
 .../zvector/autovec-double-signaling-eq-z13.c |   9 +
 .../zvector/autovec-double-signaling-eq.c     |  11 +
 .../autovec-double-signaling-ge-z13-finite.c  |  10 +
 .../zvector/autovec-double-signaling-ge-z13.c |   9 +
 .../zvector/autovec-double-signaling-ge.c     |   8 +
 .../autovec-double-signaling-gt-z13-finite.c  |  10 +
 .../zvector/autovec-double-signaling-gt-z13.c |   9 +
 .../zvector/autovec-double-signaling-gt.c     |   8 +
 .../autovec-double-signaling-le-z13-finite.c  |  10 +
 .../zvector/autovec-double-signaling-le-z13.c |   9 +
 .../zvector/autovec-double-signaling-le.c     |   8 +
 .../autovec-double-signaling-lt-z13-finite.c  |  10 +
 .../zvector/autovec-double-signaling-lt-z13.c |   9 +
 .../zvector/autovec-double-signaling-lt.c     |   8 +
 ...autovec-double-signaling-ltgt-z13-finite.c |   9 +
 .../autovec-double-signaling-ltgt-z13.c       |   9 +
 .../zvector/autovec-double-signaling-ltgt.c   |   9 +
 .../s390/zvector/autovec-double-smax-z13.F90  |  11 +
 .../s390/zvector/autovec-double-smax.F90      |   8 +
 .../s390/zvector/autovec-double-smin-z13.F90  |  11 +
 .../s390/zvector/autovec-double-smin.F90      |   8 +
 .../s390/zvector/autovec-float-quiet-eq.c     |   8 +
 .../s390/zvector/autovec-float-quiet-ge.c     |   8 +
 .../s390/zvector/autovec-float-quiet-gt.c     |   8 +
 .../s390/zvector/autovec-float-quiet-le.c     |   8 +
 .../s390/zvector/autovec-float-quiet-lt.c     |   8 +
 .../zvector/autovec-float-quiet-ordered.c     |  10 +
 .../s390/zvector/autovec-float-quiet-uneq.c   |  10 +
 .../zvector/autovec-float-quiet-unordered.c   |  11 +
 .../s390/zvector/autovec-float-signaling-eq.c |  11 +
 .../s390/zvector/autovec-float-signaling-ge.c |   8 +
 .../s390/zvector/autovec-float-signaling-gt.c |   8 +
 .../s390/zvector/autovec-float-signaling-le.c |   8 +
 .../s390/zvector/autovec-float-signaling-lt.c |   8 +
 .../zvector/autovec-float-signaling-ltgt.c    |   9 +
 .../gcc.target/s390/zvector/autovec-fortran.h |   7 +
 .../autovec-long-double-signaling-ge.c        |   8 +
 .../autovec-long-double-signaling-gt.c        |   8 +
 .../autovec-long-double-signaling-le.c        |   8 +
 .../autovec-long-double-signaling-lt.c        |   8 +
 .../gcc.target/s390/zvector/autovec.h         |  41 +++
 71 files changed, 946 insertions(+), 233 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-eq.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-ge.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-gt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-le.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-lt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-ordered.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-uneq.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-unordered.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq-z13-finite.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq-z13.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ge-z13-finite.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ge-z13.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ge.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-gt-z13-finite.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-gt-z13.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-gt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-le-z13-finite.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-le-z13.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-le.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-lt-z13-finite.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-lt-z13.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-lt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ltgt-z13-finite.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ltgt-z13.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ltgt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-smax-z13.F90
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-smax.F90
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-smin-z13.F90
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-smin.F90
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-eq.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-ge.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-gt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-le.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-lt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-ordered.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-uneq.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-unordered.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-ge.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-gt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-le.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-lt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-ltgt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-fortran.h
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-ge.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-gt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-le.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-lt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec.h

-- 
2.21.0

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 3/9] Introduce can_vector_compare_p function
  2019-08-22 13:47 [PATCH v2 0/9] S/390: Use signaling FP comparison instructions Ilya Leoshkevich
@ 2019-08-22 13:47 ` Ilya Leoshkevich
  2019-08-23 11:08   ` Richard Sandiford
  2019-08-22 13:47 ` [PATCH v2 2/9] hash_traits: split pointer_hash_mark from pointer_hash Ilya Leoshkevich
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-08-22 13:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, segher, Ilya Leoshkevich

z13 supports only non-signaling vector comparisons.  This means we
cannot vectorize LT, LE, GT, GE and LTGT when compiling for z13.
However, we cannot express this restriction today: the code only checks
whether vcond$a$b optab exists, which does not contain information about
the operation.

Introduce a function that checks whether back-end supports vector
comparisons with individual rtx codes by matching vcond expander's third
argument with a fake comparison with the corresponding rtx code.

gcc/ChangeLog:

2019-08-21  Ilya Leoshkevich  <iii@linux.ibm.com>

	* Makefile.in (GTFILES): Add optabs.c.
	* optabs-tree.c (expand_vec_cond_expr_p): Use
	can_vector_compare_p.
	* optabs.c (binop_key): Binary operation cache key.
	(binop_hasher): Binary operation cache hasher.
	(cached_binops): Binary operation cache.
	(get_cached_binop): New function that returns a cached binary
	operation or creates a new one.
	(can_vector_compare_p): New function.
	* optabs.h (enum can_vector_compare_purpose): New enum. Not
	really needed today, but can be used to extend the support to
	e.g. vec_cmp if need arises.
	(can_vector_compare_p): New function.
---
 gcc/Makefile.in   |  2 +-
 gcc/optabs-tree.c | 11 +++++--
 gcc/optabs.c      | 79 +++++++++++++++++++++++++++++++++++++++++++++++
 gcc/optabs.h      | 15 +++++++++
 4 files changed, 104 insertions(+), 3 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 597dc01328b..d2207da5657 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2541,7 +2541,7 @@ GTFILES = $(CPPLIB_H) $(srcdir)/input.h $(srcdir)/coretypes.h \
   $(srcdir)/function.c $(srcdir)/except.c \
   $(srcdir)/ggc-tests.c \
   $(srcdir)/gcse.c $(srcdir)/godump.c \
-  $(srcdir)/lists.c $(srcdir)/optabs-libfuncs.c \
+  $(srcdir)/lists.c $(srcdir)/optabs.c $(srcdir)/optabs-libfuncs.c \
   $(srcdir)/profile.c $(srcdir)/mcf.c \
   $(srcdir)/reg-stack.c $(srcdir)/cfgrtl.c \
   $(srcdir)/stor-layout.c \
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 8157798cc71..e68bb39c021 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -23,7 +23,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "coretypes.h"
 #include "target.h"
 #include "insn-codes.h"
+#include "rtl.h"
 #include "tree.h"
+#include "memmodel.h"
+#include "optabs.h"
 #include "optabs-tree.h"
 #include "stor-layout.h"
 
@@ -347,8 +350,12 @@ expand_vec_cond_expr_p (tree value_type, tree cmp_op_type, enum tree_code code)
       || maybe_ne (GET_MODE_NUNITS (value_mode), GET_MODE_NUNITS (cmp_op_mode)))
     return false;
 
-  if (get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),
-		       TYPE_UNSIGNED (cmp_op_type)) == CODE_FOR_nothing
+  bool unsigned_p = TYPE_UNSIGNED (cmp_op_type);
+  if (((get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),
+			 unsigned_p) == CODE_FOR_nothing)
+       || !can_vector_compare_p (get_rtx_code (code, unsigned_p),
+				 TYPE_MODE (value_type),
+				 TYPE_MODE (cmp_op_type), cvcp_vcond))
       && ((code != EQ_EXPR && code != NE_EXPR)
 	  || get_vcond_eq_icode (TYPE_MODE (value_type),
 				 TYPE_MODE (cmp_op_type)) == CODE_FOR_nothing))
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 9e54dda6e7f..07b4d824822 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "recog.h"
 #include "diagnostic-core.h"
 #include "rtx-vector-builder.h"
+#include "hash-table.h"
 
 /* Include insn-config.h before expr.h so that HAVE_conditional_move
    is properly defined.  */
@@ -3819,6 +3820,82 @@ can_compare_p (enum rtx_code code, machine_mode mode,
   return 0;
 }
 
+/* can_vector_compare_p presents fake rtx binary operations to the the back-end
+   in order to determine its capabilities.  In order to avoid creating fake
+   operations on each call, values from previous calls are cached in a global
+   cached_binops hash_table.  It contains rtxes, which can be looked up using
+   binop_keys.  */
+
+struct binop_key {
+  enum rtx_code code;        /* Operation code.  */
+  machine_mode value_mode;   /* Result mode.     */
+  machine_mode cmp_op_mode;  /* Operand mode.    */
+};
+
+struct binop_hasher : pointer_hash_mark<rtx>, ggc_cache_remove<rtx> {
+  typedef rtx value_type;
+  typedef binop_key compare_type;
+
+  static hashval_t
+  hash (enum rtx_code code, machine_mode value_mode, machine_mode cmp_op_mode)
+  {
+    inchash::hash hstate (0);
+    hstate.add_int (code);
+    hstate.add_int (value_mode);
+    hstate.add_int (cmp_op_mode);
+    return hstate.end ();
+  }
+
+  static hashval_t
+  hash (const rtx &ref)
+  {
+    return hash (GET_CODE (ref), GET_MODE (ref), GET_MODE (XEXP (ref, 0)));
+  }
+
+  static bool
+  equal (const rtx &ref1, const binop_key &ref2)
+  {
+    return (GET_CODE (ref1) == ref2.code)
+	   && (GET_MODE (ref1) == ref2.value_mode)
+	   && (GET_MODE (XEXP (ref1, 0)) == ref2.cmp_op_mode);
+  }
+};
+
+static GTY ((cache)) hash_table<binop_hasher> *cached_binops;
+
+static rtx
+get_cached_binop (enum rtx_code code, machine_mode value_mode,
+		  machine_mode cmp_op_mode)
+{
+  if (!cached_binops)
+    cached_binops = hash_table<binop_hasher>::create_ggc (1024);
+  binop_key key = { code, value_mode, cmp_op_mode };
+  hashval_t hash = binop_hasher::hash (code, value_mode, cmp_op_mode);
+  rtx *slot = cached_binops->find_slot_with_hash (key, hash, INSERT);
+  if (!*slot)
+    *slot = gen_rtx_fmt_ee (code, value_mode, gen_reg_rtx (cmp_op_mode),
+			    gen_reg_rtx (cmp_op_mode));
+  return *slot;
+}
+
+bool
+can_vector_compare_p (enum rtx_code code, machine_mode value_mode,
+		      machine_mode cmp_op_mode,
+		      enum can_vector_compare_purpose purpose)
+{
+  enum insn_code icode;
+  bool unsigned_p = (code == LTU || code == LEU || code == GTU || code == GEU);
+  rtx test = get_cached_binop(code, value_mode, cmp_op_mode);
+
+  if (purpose == cvcp_vcond
+      && (icode = get_vcond_icode (value_mode, cmp_op_mode, unsigned_p))
+	 != CODE_FOR_nothing
+      && insn_operand_matches (icode, 3, test))
+    return true;
+
+  return false;
+}
+
 /* This function is called when we are going to emit a compare instruction that
    compares the values found in X and Y, using the rtl operator COMPARISON.
 
@@ -7481,3 +7558,5 @@ expand_jump_insn (enum insn_code icode, unsigned int nops,
   if (!maybe_expand_jump_insn (icode, nops, ops))
     gcc_unreachable ();
 }
+
+#include "gt-optabs.h"
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 897bb5d4443..2b2338a67af 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -242,6 +242,21 @@ enum can_compare_purpose
    (without splitting it into pieces).  */
 extern int can_compare_p (enum rtx_code, machine_mode,
 			  enum can_compare_purpose);
+
+/* The various uses that a vector comparison can have; used by
+   can_vector_compare_p.  So far only vcond is defined, vec_cmp is a possible
+   future extension.  */
+enum can_vector_compare_purpose
+{
+  cvcp_vcond
+};
+
+/* Return whether back-end can emit a vector comparison insn(s) using a given
+   CODE, with operands with CMP_OP_MODE, producing a result with VALUE_MODE,
+   in order to achieve a PURPOSE.  */
+extern bool can_vector_compare_p (enum rtx_code, machine_mode, machine_mode,
+				  enum can_vector_compare_purpose);
+
 extern rtx prepare_operand (enum insn_code, rtx, int, machine_mode,
 			    machine_mode, int);
 /* Emit a pair of rtl insns to compare two rtx's and to jump
-- 
2.21.0

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 2/9] hash_traits: split pointer_hash_mark from pointer_hash
  2019-08-22 13:47 [PATCH v2 0/9] S/390: Use signaling FP comparison instructions Ilya Leoshkevich
  2019-08-22 13:47 ` [PATCH v2 3/9] Introduce can_vector_compare_p function Ilya Leoshkevich
@ 2019-08-22 13:47 ` Ilya Leoshkevich
  2019-08-22 13:47 ` [PATCH v2 1/9] Document signaling for min, max and ltgt operations Ilya Leoshkevich
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-08-22 13:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, segher, Ilya Leoshkevich

The next patch introducing can_vector_compare_p function needs to store
rtxes in a hash table and look them up using a special key type.
Currently pointer_hash requires value_type to be the same as
compare_type, so it would not be usable and one would have to implement
mark_deleted, mark_empty, is_deleted and is_empty manually.

Split pointer_hash_mark out of pointer_hash in order to support such use
cases.  Also make use of it in the existing code where possible.

gcc/ChangeLog:

2019-08-22  Ilya Leoshkevich  <iii@linux.ibm.com>

	* hash-traits.h (struct pointer_hash_mark): New trait.
	(Pointer>::mark_deleted): Move from pointer_hash.
	(Pointer>::mark_empty): Likewise.
	(Pointer>::is_deleted): Likewise.
	(Pointer>::is_empty): Likewise.
	(struct pointer_hash): Inherit from pointer_hash_mark.
	(Type>::mark_deleted): Move to pointer_hash_mark.
	(Type>::mark_empty): Likewise.
	(Type>::is_deleted): Likewise.
	(Type>::is_empty): Likewise.
	* ipa-prop.c (struct ipa_bit_ggc_hash_traits): Use
	pointer_hash_mark.
	(struct ipa_vr_ggc_hash_traits): Likewise.

gcc/cp/ChangeLog:

2019-08-22  Ilya Leoshkevich  <iii@linux.ibm.com>

	* decl2.c (struct mangled_decl_hash): Use pointer_hash_mark.
---
 gcc/cp/decl2.c    | 14 +--------
 gcc/hash-traits.h | 74 ++++++++++++++++++++++++++---------------------
 gcc/ipa-prop.c    | 47 ++++--------------------------
 3 files changed, 47 insertions(+), 88 deletions(-)

diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index a32108f9d16..36a10f491fa 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -105,7 +105,7 @@ static GTY(()) vec<tree, va_gc> *mangling_aliases;
 /* hash traits for declarations.  Hashes single decls via
    DECL_ASSEMBLER_NAME_RAW.  */
 
-struct mangled_decl_hash : ggc_remove <tree>
+struct mangled_decl_hash : pointer_hash_mark <tree>, ggc_remove <tree>
 {
   typedef tree value_type; /* A DECL.  */
   typedef tree compare_type; /* An identifier.  */
@@ -119,18 +119,6 @@ struct mangled_decl_hash : ggc_remove <tree>
     tree name = DECL_ASSEMBLER_NAME_RAW (existing);
     return candidate == name;
   }
-
-  static inline void mark_empty (value_type &p) {p = NULL_TREE;}
-  static inline bool is_empty (value_type p) {return !p;}
-
-  static bool is_deleted (value_type e)
-  {
-    return e == reinterpret_cast <value_type> (1);
-  }
-  static void mark_deleted (value_type &e)
-  {
-    e = reinterpret_cast <value_type> (1);
-  }
 };
 
 /* A hash table of decls keyed by mangled name.  Used to figure out if
diff --git a/gcc/hash-traits.h b/gcc/hash-traits.h
index 2d17e2c982a..e5c9e88d99f 100644
--- a/gcc/hash-traits.h
+++ b/gcc/hash-traits.h
@@ -136,12 +136,52 @@ int_hash <Type, Empty, Deleted>::is_empty (Type x)
   return x == Empty;
 }
 
+/* Base class for pointer hashers that want to implement marking in a generic
+   way.  */
+
+template <typename Pointer>
+struct pointer_hash_mark
+{
+  static inline void mark_deleted (Pointer &);
+  static inline void mark_empty (Pointer &);
+  static inline bool is_deleted (Pointer);
+  static inline bool is_empty (Pointer);
+};
+
+template <typename Pointer>
+inline void
+pointer_hash_mark <Pointer>::mark_deleted (Pointer &e)
+{
+  e = reinterpret_cast<Pointer> (1);
+}
+
+template <typename Pointer>
+inline void
+pointer_hash_mark <Pointer>::mark_empty (Pointer &e)
+{
+  e = NULL;
+}
+
+template <typename Pointer>
+inline bool
+pointer_hash_mark <Pointer>::is_deleted (Pointer e)
+{
+  return e == reinterpret_cast<Pointer> (1);
+}
+
+template <typename Pointer>
+inline bool
+pointer_hash_mark <Pointer>::is_empty (Pointer e)
+{
+  return e == NULL;
+}
+
 /* Pointer hasher based on pointer equality.  Other types of pointer hash
    can inherit this and override the hash and equal functions with some
    other form of equality (such as string equality).  */
 
 template <typename Type>
-struct pointer_hash
+struct pointer_hash : pointer_hash_mark<Type *>
 {
   typedef Type *value_type;
   typedef Type *compare_type;
@@ -149,10 +189,6 @@ struct pointer_hash
   static inline hashval_t hash (const value_type &);
   static inline bool equal (const value_type &existing,
 			    const compare_type &candidate);
-  static inline void mark_deleted (Type *&);
-  static inline void mark_empty (Type *&);
-  static inline bool is_deleted (Type *);
-  static inline bool is_empty (Type *);
 };
 
 template <typename Type>
@@ -172,34 +208,6 @@ pointer_hash <Type>::equal (const value_type &existing,
   return existing == candidate;
 }
 
-template <typename Type>
-inline void
-pointer_hash <Type>::mark_deleted (Type *&e)
-{
-  e = reinterpret_cast<Type *> (1);
-}
-
-template <typename Type>
-inline void
-pointer_hash <Type>::mark_empty (Type *&e)
-{
-  e = NULL;
-}
-
-template <typename Type>
-inline bool
-pointer_hash <Type>::is_deleted (Type *e)
-{
-  return e == reinterpret_cast<Type *> (1);
-}
-
-template <typename Type>
-inline bool
-pointer_hash <Type>::is_empty (Type *e)
-{
-  return e == NULL;
-}
-
 /* Hasher for "const char *" strings, using string rather than pointer
    equality.  */
 
diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 1a0e12e6c0c..6fde7500cf4 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -64,7 +64,8 @@ ipa_edge_args_sum_t *ipa_edge_args_sum;
 
 /* Traits for a hash table for reusing already existing ipa_bits. */
 
-struct ipa_bit_ggc_hash_traits : public ggc_cache_remove <ipa_bits *>
+struct ipa_bit_ggc_hash_traits : pointer_hash_mark <ipa_bits *>,
+                                 ggc_cache_remove <ipa_bits *>
 {
   typedef ipa_bits *value_type;
   typedef ipa_bits *compare_type;
@@ -79,26 +80,6 @@ struct ipa_bit_ggc_hash_traits : public ggc_cache_remove <ipa_bits *>
     {
       return a->value == b->value && a->mask == b->mask;
     }
-  static void
-  mark_empty (ipa_bits *&p)
-    {
-      p = NULL;
-    }
-  static bool
-  is_empty (const ipa_bits *p)
-    {
-      return p == NULL;
-    }
-  static bool
-  is_deleted (const ipa_bits *p)
-    {
-      return p == reinterpret_cast<const ipa_bits *> (1);
-    }
-  static void
-  mark_deleted (ipa_bits *&p)
-    {
-      p = reinterpret_cast<ipa_bits *> (1);
-    }
 };
 
 /* Hash table for avoid repeated allocations of equal ipa_bits.  */
@@ -107,7 +88,9 @@ static GTY ((cache)) hash_table<ipa_bit_ggc_hash_traits> *ipa_bits_hash_table;
 /* Traits for a hash table for reusing value_ranges used for IPA.  Note that
    the equiv bitmap is not hashed and is expected to be NULL.  */
 
-struct ipa_vr_ggc_hash_traits : public ggc_cache_remove <value_range_base *>
+struct ipa_vr_ggc_hash_traits : pointer_hash_mark <value_range_base *>,
+                                ggc_cache_remove <value_range_base *>
+
 {
   typedef value_range_base *value_type;
   typedef value_range_base *compare_type;
@@ -124,26 +107,6 @@ struct ipa_vr_ggc_hash_traits : public ggc_cache_remove <value_range_base *>
     {
       return a->equal_p (*b);
     }
-  static void
-  mark_empty (value_range_base *&p)
-    {
-      p = NULL;
-    }
-  static bool
-  is_empty (const value_range_base *p)
-    {
-      return p == NULL;
-    }
-  static bool
-  is_deleted (const value_range_base *p)
-    {
-      return p == reinterpret_cast<const value_range_base *> (1);
-    }
-  static void
-  mark_deleted (value_range_base *&p)
-    {
-      p = reinterpret_cast<value_range_base *> (1);
-    }
 };
 
 /* Hash table for avoid repeated allocations of equal value_ranges.  */
-- 
2.21.0

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 4/9] S/390: Do not use signaling vector comparisons on z13
  2019-08-22 13:47 [PATCH v2 0/9] S/390: Use signaling FP comparison instructions Ilya Leoshkevich
                   ` (2 preceding siblings ...)
  2019-08-22 13:47 ` [PATCH v2 1/9] Document signaling for min, max and ltgt operations Ilya Leoshkevich
@ 2019-08-22 13:48 ` Ilya Leoshkevich
  2019-08-22 13:48 ` [PATCH v2 5/9] S/390: Implement vcond expander for V1TI,V1TF Ilya Leoshkevich
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-08-22 13:48 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, segher, Ilya Leoshkevich

z13 supports only non-signaling vector comparisons.  This means we
cannot vectorize LT, LE, GT, GE and LTGT when compiling for z13.  Notify
middle-end about this using more restrictive operator predicate in
vcond<V_HW:mode><V_HW2:mode>.

gcc/ChangeLog:

2019-08-21  Ilya Leoshkevich  <iii@linux.ibm.com>

	* config/s390/vector.md (vcond_comparison_operator): New
	predicate.
	(vcond<V_HW:mode><V_HW2:mode>): Use vcond_comparison_operator.
---
 gcc/config/s390/vector.md | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 0702e1de835..d7a266c5605 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -614,10 +614,30 @@
   operands[2] = GEN_INT (GET_MODE_NUNITS (<MODE>mode) - 1);
 })
 
+(define_predicate "vcond_comparison_operator"
+  (match_operand 0 "comparison_operator")
+{
+  if (!HONOR_NANS (GET_MODE (XEXP (op, 0)))
+      && !HONOR_NANS (GET_MODE (XEXP (op, 1))))
+    return true;
+  switch (GET_CODE (op))
+    {
+    case LE:
+    case LT:
+    case GE:
+    case GT:
+    case LTGT:
+      /* Signaling vector comparisons are supported only on z14+.  */
+      return TARGET_Z14;
+    default:
+      return true;
+    }
+})
+
 (define_expand "vcond<V_HW:mode><V_HW2:mode>"
   [(set (match_operand:V_HW 0 "register_operand" "")
 	(if_then_else:V_HW
-	 (match_operator 3 "comparison_operator"
+	 (match_operator 3 "vcond_comparison_operator"
 			 [(match_operand:V_HW2 4 "register_operand" "")
 			  (match_operand:V_HW2 5 "nonmemory_operand" "")])
 	 (match_operand:V_HW 1 "nonmemory_operand" "")
-- 
2.21.0

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 5/9] S/390: Implement vcond expander for V1TI,V1TF
  2019-08-22 13:47 [PATCH v2 0/9] S/390: Use signaling FP comparison instructions Ilya Leoshkevich
                   ` (3 preceding siblings ...)
  2019-08-22 13:48 ` [PATCH v2 4/9] S/390: Do not use signaling vector comparisons on z13 Ilya Leoshkevich
@ 2019-08-22 13:48 ` Ilya Leoshkevich
  2019-08-22 13:54 ` [PATCH v2 6/9] S/390: Remove code duplication in vec_unordered<mode> Ilya Leoshkevich
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-08-22 13:48 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, segher, Ilya Leoshkevich

Currently gcc does not emit wf{c,k}* instructions when comparing long
double values.  Middle-end actually adds them in the first place, but
then veclower pass replaces them with floating point register pair
operations, because the corresponding expander is missing.

gcc/ChangeLog:

2019-08-09  Ilya Leoshkevich  <iii@linux.ibm.com>

	* config/s390/vector.md (vcondv1tiv1tf): New variant of
	vcond$a$b expander.
---
 gcc/config/s390/vector.md | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index d7a266c5605..ca5ec0dd3b0 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -649,6 +649,21 @@
   DONE;
 })
 
+(define_expand "vcondv1tiv1tf"
+  [(set (match_operand:V1TI 0 "register_operand" "")
+	(if_then_else:V1TI
+	 (match_operator 3 "vcond_comparison_operator"
+			 [(match_operand:V1TF 4 "register_operand" "")
+			  (match_operand:V1TF 5 "nonmemory_operand" "")])
+	 (match_operand:V1TI 1 "nonmemory_operand" "")
+	 (match_operand:V1TI 2 "nonmemory_operand" "")))]
+  "TARGET_VXE"
+{
+  s390_expand_vcond (operands[0], operands[1], operands[2],
+		     GET_CODE (operands[3]), operands[4], operands[5]);
+  DONE;
+})
+
 (define_expand "vcondu<V_HW:mode><V_HW2:mode>"
   [(set (match_operand:V_HW 0 "register_operand" "")
 	(if_then_else:V_HW
-- 
2.21.0

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 6/9] S/390: Remove code duplication in vec_unordered<mode>
  2019-08-22 13:47 [PATCH v2 0/9] S/390: Use signaling FP comparison instructions Ilya Leoshkevich
                   ` (4 preceding siblings ...)
  2019-08-22 13:48 ` [PATCH v2 5/9] S/390: Implement vcond expander for V1TI,V1TF Ilya Leoshkevich
@ 2019-08-22 13:54 ` Ilya Leoshkevich
  2019-08-22 14:13 ` [PATCH v2 7/9] S/390: Remove code duplication in vec_* comparison expanders Ilya Leoshkevich
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-08-22 13:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, segher, Ilya Leoshkevich

vec_unordered<mode> is vec_ordered<mode> plus a negation at the end.
Reuse vec_unordered<mode> logic.

gcc/ChangeLog:

2019-08-13  Ilya Leoshkevich  <iii@linux.ibm.com>

	* config/s390/vector.md (vec_unordered<mode>): Call
	gen_vec_ordered<mode>.
---
 gcc/config/s390/vector.md | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index ca5ec0dd3b0..1b66b8be61f 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -1581,15 +1581,15 @@
 
 ; UNORDERED (a, b): !ORDERED (a, b)
 (define_expand "vec_unordered<mode>"
-  [(set (match_operand:<tointvec>          0 "register_operand" "=v")
-	(ge:<tointvec> (match_operand:VFT 1 "register_operand"  "v")
-		 (match_operand:VFT 2 "register_operand"  "v")))
-   (set (match_dup 3) (gt:<tointvec> (match_dup 2) (match_dup 1)))
-   (set (match_dup 0) (ior:<tointvec> (match_dup 0) (match_dup 3)))
-   (set (match_dup 0) (not:<tointvec> (match_dup 0)))]
+  [(match_operand:<tointvec> 0 "register_operand" "=v")
+   (match_operand:VFT        1 "register_operand" "v")
+   (match_operand:VFT        2 "register_operand" "v")]
   "TARGET_VX"
 {
-  operands[3] = gen_reg_rtx (<tointvec>mode);
+  emit_insn (gen_vec_ordered<mode> (operands[0], operands[1], operands[2]));
+  emit_insn (gen_rtx_SET (operands[0],
+	     gen_rtx_NOT (<tointvec>mode, operands[0])));
+  DONE;
 })
 
 (define_expand "vec_unordered"
-- 
2.21.0

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 7/9] S/390: Remove code duplication in vec_* comparison expanders
  2019-08-22 13:47 [PATCH v2 0/9] S/390: Use signaling FP comparison instructions Ilya Leoshkevich
                   ` (5 preceding siblings ...)
  2019-08-22 13:54 ` [PATCH v2 6/9] S/390: Remove code duplication in vec_unordered<mode> Ilya Leoshkevich
@ 2019-08-22 14:13 ` Ilya Leoshkevich
  2019-08-22 14:17 ` [PATCH v2 8/9] S/390: Use signaling FP comparison instructions Ilya Leoshkevich
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-08-22 14:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, segher, Ilya Leoshkevich

s390.md uses a lot of near-identical expanders that perform dispatching
to other expanders based on operand types. Since the following patch
would require even more of these, avoid copy-pasting the code by
generating these expanders using an iterator.

gcc/ChangeLog:

2019-08-09  Ilya Leoshkevich  <iii@linux.ibm.com>

	* config/s390/s390.c (s390_expand_vec_compare): Use
	gen_vec_cmpordered and gen_vec_cmpunordered.
	* config/s390/vector.md (vec_cmpuneq, vec_cmpltgt, vec_ordered,
	vec_unordered): Delete.
	(vec_ordered<mode>): Rename to vec_cmpordered<mode>.
	(vec_unordered<mode>): Rename to vec_cmpunordered<mode>.
	(vec_cmp<code>): Generic dispatcher.
---
 gcc/config/s390/s390.c    |  4 +--
 gcc/config/s390/vector.md | 67 +++++++--------------------------------
 2 files changed, 13 insertions(+), 58 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index fa17d7d5d08..f9817f6edaf 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -6523,10 +6523,10 @@ s390_expand_vec_compare (rtx target, enum rtx_code cond,
 	  emit_insn (gen_vec_cmpltgt (target, cmp_op1, cmp_op2));
 	  return;
 	case ORDERED:
-	  emit_insn (gen_vec_ordered (target, cmp_op1, cmp_op2));
+	  emit_insn (gen_vec_cmpordered (target, cmp_op1, cmp_op2));
 	  return;
 	case UNORDERED:
-	  emit_insn (gen_vec_unordered (target, cmp_op1, cmp_op2));
+	  emit_insn (gen_vec_cmpunordered (target, cmp_op1, cmp_op2));
 	  return;
 	default: break;
 	}
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 1b66b8be61f..a093ae5c565 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -1507,22 +1507,6 @@
   operands[3] = gen_reg_rtx (<tointvec>mode);
 })
 
-(define_expand "vec_cmpuneq"
-  [(match_operand 0 "register_operand" "")
-   (match_operand 1 "register_operand" "")
-   (match_operand 2 "register_operand" "")]
-  "TARGET_VX"
-{
-  if (GET_MODE (operands[1]) == V4SFmode)
-    emit_insn (gen_vec_cmpuneqv4sf (operands[0], operands[1], operands[2]));
-  else if (GET_MODE (operands[1]) == V2DFmode)
-    emit_insn (gen_vec_cmpuneqv2df (operands[0], operands[1], operands[2]));
-  else
-    gcc_unreachable ();
-
-  DONE;
-})
-
 ; LTGT a <> b -> a > b | b > a
 (define_expand "vec_cmpltgt<mode>"
   [(set (match_operand:<tointvec>         0 "register_operand" "=v")
@@ -1535,24 +1519,8 @@
   operands[3] = gen_reg_rtx (<tointvec>mode);
 })
 
-(define_expand "vec_cmpltgt"
-  [(match_operand 0 "register_operand" "")
-   (match_operand 1 "register_operand" "")
-   (match_operand 2 "register_operand" "")]
-  "TARGET_VX"
-{
-  if (GET_MODE (operands[1]) == V4SFmode)
-    emit_insn (gen_vec_cmpltgtv4sf (operands[0], operands[1], operands[2]));
-  else if (GET_MODE (operands[1]) == V2DFmode)
-    emit_insn (gen_vec_cmpltgtv2df (operands[0], operands[1], operands[2]));
-  else
-    gcc_unreachable ();
-
-  DONE;
-})
-
 ; ORDERED (a, b): a >= b | b > a
-(define_expand "vec_ordered<mode>"
+(define_expand "vec_cmpordered<mode>"
   [(set (match_operand:<tointvec>          0 "register_operand" "=v")
 	(ge:<tointvec> (match_operand:VFT 1 "register_operand"  "v")
 		 (match_operand:VFT 2 "register_operand"  "v")))
@@ -1563,45 +1531,32 @@
   operands[3] = gen_reg_rtx (<tointvec>mode);
 })
 
-(define_expand "vec_ordered"
-  [(match_operand 0 "register_operand" "")
-   (match_operand 1 "register_operand" "")
-   (match_operand 2 "register_operand" "")]
-  "TARGET_VX"
-{
-  if (GET_MODE (operands[1]) == V4SFmode)
-    emit_insn (gen_vec_orderedv4sf (operands[0], operands[1], operands[2]));
-  else if (GET_MODE (operands[1]) == V2DFmode)
-    emit_insn (gen_vec_orderedv2df (operands[0], operands[1], operands[2]));
-  else
-    gcc_unreachable ();
-
-  DONE;
-})
-
 ; UNORDERED (a, b): !ORDERED (a, b)
-(define_expand "vec_unordered<mode>"
+(define_expand "vec_cmpunordered<mode>"
   [(match_operand:<tointvec> 0 "register_operand" "=v")
    (match_operand:VFT        1 "register_operand" "v")
    (match_operand:VFT        2 "register_operand" "v")]
   "TARGET_VX"
 {
-  emit_insn (gen_vec_ordered<mode> (operands[0], operands[1], operands[2]));
+  emit_insn (gen_vec_cmpordered<mode> (operands[0], operands[1], operands[2]));
   emit_insn (gen_rtx_SET (operands[0],
 	     gen_rtx_NOT (<tointvec>mode, operands[0])));
   DONE;
 })
 
-(define_expand "vec_unordered"
+(define_code_iterator VEC_CODE_WITH_COMPLEX_EXPAND
+  [uneq ltgt ordered unordered])
+
+(define_expand "vec_cmp<code>"
   [(match_operand 0 "register_operand" "")
-   (match_operand 1 "register_operand" "")
-   (match_operand 2 "register_operand" "")]
+   (VEC_CODE_WITH_COMPLEX_EXPAND (match_operand 1 "register_operand" "")
+				 (match_operand 2 "register_operand" ""))]
   "TARGET_VX"
 {
   if (GET_MODE (operands[1]) == V4SFmode)
-    emit_insn (gen_vec_unorderedv4sf (operands[0], operands[1], operands[2]));
+    emit_insn (gen_vec_cmp<code>v4sf (operands[0], operands[1], operands[2]));
   else if (GET_MODE (operands[1]) == V2DFmode)
-    emit_insn (gen_vec_unorderedv2df (operands[0], operands[1], operands[2]));
+    emit_insn (gen_vec_cmp<code>v2df (operands[0], operands[1], operands[2]));
   else
     gcc_unreachable ();
 
-- 
2.21.0

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 8/9] S/390: Use signaling FP comparison instructions
  2019-08-22 13:47 [PATCH v2 0/9] S/390: Use signaling FP comparison instructions Ilya Leoshkevich
                   ` (6 preceding siblings ...)
  2019-08-22 14:13 ` [PATCH v2 7/9] S/390: Remove code duplication in vec_* comparison expanders Ilya Leoshkevich
@ 2019-08-22 14:17 ` Ilya Leoshkevich
  2019-08-22 14:26 ` [PATCH v2 9/9] S/390: Test " Ilya Leoshkevich
  2019-08-29 16:08 ` [PATCH v2 0/9] S/390: Use " Ilya Leoshkevich
  9 siblings, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-08-22 14:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, segher, Ilya Leoshkevich

dg-torture.exp=inf-compare-1.c is failing, because (qNaN > +Inf)
comparison is compiled to CDB instruction, which does not signal an
invalid operation exception. KDB should have been used instead.

This patch introduces a new CCmode and a new pattern in order to
generate signaling instructions in this and similar cases.

gcc/ChangeLog:

2019-08-09  Ilya Leoshkevich  <iii@linux.ibm.com>

	* config/s390/2827.md: Add new opcodes.
	* config/s390/2964.md: Likewise.
	* config/s390/3906.md: Likewise.
	* config/s390/8561.md: Likewise.
	* config/s390/s390-builtins.def (s390_vfchesb): Use
	the new vec_cmpgev4sf_quiet_nocc.
	(s390_vfchedb): Use the new vec_cmpgev2df_quiet_nocc.
	(s390_vfchsb): Use the new vec_cmpgtv4sf_quiet_nocc.
	(s390_vfchdb): Use the new vec_cmpgtv2df_quiet_nocc.
	(vec_cmplev4sf): Use the new vec_cmplev4sf_quiet_nocc.
	(vec_cmplev2df): Use the new vec_cmplev2df_quiet_nocc.
	(vec_cmpltv4sf): Use the new vec_cmpltv4sf_quiet_nocc.
	(vec_cmpltv2df): Use the new vec_cmpltv2df_quiet_nocc.
	* config/s390/s390-modes.def (CCSFPS): New mode.
	* config/s390/s390.c (s390_match_ccmode_set): Support CCSFPS.
	(s390_select_ccmode): Return CCSFPS for LT, LE, GT, GE and LTGT.
	(s390_branch_condition_mask): Reuse CCS for CCSFPS.
	(s390_expand_vec_compare): Use non-signaling patterns where
	necessary.
	(s390_reverse_condition): Support CCSFPS.
	* config/s390/s390.md (*cmp<mode>_ccsfps): New pattern.
	* config/s390/vector.md: (VFCMP_HW_OP): Remove.
	(asm_fcmp_op): Likewise.
	(*smaxv2df3_vx): Use pattern for quiet comparison.
	(*sminv2df3_vx): Likewise.
	(*vec_cmp<VFCMP_HW_OP:code><mode>_nocc): Remove.
	(*vec_cmpeq<mode>_quiet_nocc): New pattern.
	(vec_cmpgt<mode>_quiet_nocc): Likewise.
	(vec_cmplt<mode>_quiet_nocc): New expander.
	(vec_cmpge<mode>_quiet_nocc): New pattern.
	(vec_cmple<mode>_quiet_nocc): New expander.
	(*vec_cmpeq<mode>_signaling_nocc): New pattern.
	(*vec_cmpgt<mode>_signaling_nocc): Likewise.
	(*vec_cmpgt<mode>_signaling_finite_nocc): Likewise.
	(*vec_cmpge<mode>_signaling_nocc): Likewise.
	(*vec_cmpge<mode>_signaling_finite_nocc): Likewise.
	(vec_cmpungt<mode>): New expander.
	(vec_cmpunge<mode>): Likewise.
	(vec_cmpuneq<mode>): Use quiet patterns.
	(vec_cmpltgt<mode>): Allow only on z14+.
	(vec_cmpordered<mode>): Use quiet patterns.
	(vec_cmpunordered<mode>): Likewise.
	(VEC_CODE_WITH_COMPLEX_EXPAND): Add ungt and unge.

gcc/testsuite/ChangeLog:

2019-08-09  Ilya Leoshkevich  <iii@linux.ibm.com>

	* gcc.target/s390/vector/vec-scalar-cmp-1.c: Adjust
	expectations.
---
 gcc/config/s390/2827.md                       |  14 +-
 gcc/config/s390/2964.md                       |  13 +-
 gcc/config/s390/3906.md                       |  17 +-
 gcc/config/s390/8561.md                       |  19 +-
 gcc/config/s390/s390-builtins.def             |  16 +-
 gcc/config/s390/s390-modes.def                |   8 +
 gcc/config/s390/s390.c                        |  34 ++--
 gcc/config/s390/s390.md                       |  14 ++
 gcc/config/s390/vector.md                     | 171 +++++++++++++++---
 .../gcc.target/s390/vector/vec-scalar-cmp-1.c |   8 +-
 10 files changed, 240 insertions(+), 74 deletions(-)

diff --git a/gcc/config/s390/2827.md b/gcc/config/s390/2827.md
index 3f63f82284d..aafe8e27339 100644
--- a/gcc/config/s390/2827.md
+++ b/gcc/config/s390/2827.md
@@ -44,7 +44,7 @@
 
 (define_insn_reservation "zEC12_normal_fp" 8
   (and (eq_attr "cpu" "zEC12")
-       (eq_attr "mnemonic" "lnebr,sdbr,sebr,clfxtr,adbr,aebr,celfbr,clfebr,lpebr,msebr,lndbr,clfdbr,cebr,maebr,ltebr,clfdtr,cdlgbr,cxlftr,lpdbr,cdfbr,lcebr,clfxbr,msdbr,cdbr,madbr,meebr,clgxbr,clgdtr,ledbr,cegbr,cdlftr,cdlgtr,mdbr,clgebr,ltdbr,cdlfbr,cdgbr,clgxtr,lcdbr,celgbr,clgdbr,ldebr,cefbr,fidtr,fixtr,madb,msdb,mseb,fiebra,fidbra,aeb,mdb,seb,cdb,tcdb,sdb,adb,tceb,maeb,ceb,meeb,ldeb")) "nothing")
+       (eq_attr "mnemonic" "lnebr,sdbr,sebr,clfxtr,adbr,aebr,celfbr,clfebr,lpebr,msebr,lndbr,clfdbr,cebr,maebr,ltebr,clfdtr,cdlgbr,cxlftr,lpdbr,cdfbr,lcebr,clfxbr,msdbr,cdbr,madbr,meebr,clgxbr,clgdtr,ledbr,cegbr,cdlftr,cdlgtr,mdbr,clgebr,ltdbr,cdlfbr,cdgbr,clgxtr,lcdbr,celgbr,clgdbr,ldebr,cefbr,fidtr,fixtr,madb,msdb,mseb,fiebra,fidbra,aeb,mdb,seb,cdb,tcdb,sdb,adb,tceb,maeb,ceb,meeb,ldeb,keb,kebr,kdb,kdbr")) "nothing")
 
 (define_insn_reservation "zEC12_cgdbr" 2
   (and (eq_attr "cpu" "zEC12")
@@ -426,6 +426,10 @@
   (and (eq_attr "cpu" "zEC12")
        (eq_attr "mnemonic" "cxbr")) "nothing")
 
+(define_insn_reservation "zEC12_kxbr" 18
+  (and (eq_attr "cpu" "zEC12")
+       (eq_attr "mnemonic" "kxbr")) "nothing")
+
 (define_insn_reservation "zEC12_ddbr" 36
   (and (eq_attr "cpu" "zEC12")
        (eq_attr "mnemonic" "ddbr")) "nothing")
@@ -578,10 +582,18 @@
   (and (eq_attr "cpu" "zEC12")
        (eq_attr "mnemonic" "cdtr")) "nothing")
 
+(define_insn_reservation "zEC12_kdtr" 11
+  (and (eq_attr "cpu" "zEC12")
+       (eq_attr "mnemonic" "kdtr")) "nothing")
+
 (define_insn_reservation "zEC12_cxtr" 14
   (and (eq_attr "cpu" "zEC12")
        (eq_attr "mnemonic" "cxtr")) "nothing")
 
+(define_insn_reservation "zEC12_kxtr" 14
+  (and (eq_attr "cpu" "zEC12")
+       (eq_attr "mnemonic" "kxtr")) "nothing")
+
 (define_insn_reservation "zEC12_slbg" 3
   (and (eq_attr "cpu" "zEC12")
        (eq_attr "mnemonic" "slbg")) "nothing")
diff --git a/gcc/config/s390/2964.md b/gcc/config/s390/2964.md
index a7897bcf58a..4396e3ba1c0 100644
--- a/gcc/config/s390/2964.md
+++ b/gcc/config/s390/2964.md
@@ -69,7 +69,7 @@ ng,ni,niy,ntstg,ny,o,og,oi,oiy,oy,s,sar,sdb,seb,sfpc,sg,sgf,sh,shy,sl,\
 slb,slbg,slg,slgf,sly,sqdb,sqeb,st,stc,stcy,std,stdy,ste,stey,stg,stgrl,\
 sth,sthrl,sthy,stoc,stocg,strl,strv,strvg,strvh,sty,sy,tabort,tm,tmy,vl,\
 vlbb,vleb,vlef,vleg,vleh,vll,vllezb,vllezf,vllezg,vllezh,vllezlf,vlrepb,\
-vlrepf,vlrepg,vlreph,vst,vstl,x,xg,xi,xiy,xy")
+vlrepf,vlrepg,vlreph,vst,vstl,x,xg,xi,xiy,xy,kdb")
  (const_int 1)] (const_int 0)))
 
 (define_attr "z13_unit_vfu" ""
@@ -109,7 +109,8 @@ vuplhh,vuplhw,vupllb,vupllf,vupllh,vx,vzero,wcdgb,wcdlgb,wcgdb,wclgdb,wfadb,\
 wfasb,wfaxb,wfcdb,wfcedb,wfcesb,wfcexbs,wfchdb,wfchedb,wfchesb,wfchexb,\
 wfchexbs,wfchsb,wfchxb,wfchxbs,wfcsb,wfisb,wfixb,wflcdb,wflcsb,wflcxb,wflld,\
 wflndb,wflnsb,wflnxb,wflpdb,wflpsb,wflpxb,wfmadb,wfmasb,wfmaxb,wfmdb,wfmsb,\
-wfmsdb,wfmssb,wfmsxb,wfmxb,wfsdb,wfssb,wfsxb,wldeb,wledb")
+wfmsdb,wfmssb,wfmsxb,wfmxb,wfsdb,wfssb,wfsxb,wldeb,wledb,kebr,kdb,kdbr,kxbr,\
+kdtr,kxtr,wfkdb,wfksb")
  (const_int 1)] (const_int 0)))
 
 (define_attr "z13_cracked" ""
@@ -131,7 +132,7 @@ stmg,stmy,tbegin,tbeginc")
 cxtr,dlgr,dlr,dr,dsgfr,dsgr,dxbr,dxtr,fixbr,fixbra,fixtr,flogr,lcxbr,lnxbr,\
 lpxbr,ltxbr,ltxtr,lxdb,lxdbr,lxdtr,lxeb,lxebr,m,madb,maeb,maebr,mfy,ml,mlg,\
 mlgr,mlr,mr,msdb,mseb,msebr,mvc,mxbr,mxtr,oc,sfpc,slb,slbg,slbgr,slbr,\
-sqxbr,sxbr,sxtr,tabort,tcxb,tdcxt,tend,xc")
+sqxbr,sxbr,sxtr,tabort,tcxb,tdcxt,tend,xc,kxbr,kxtr")
  (const_int 1)] (const_int 0)))
 
 (define_attr "z13_endgroup" ""
@@ -198,7 +199,7 @@ vchlhs,vfcedbs,vfcesbs,vfchdbs,vfchedbs,vfchesbs,vfchsbs,vfeeb,vfeef,vfeeh,\
 vfeneb,vfenef,vfeneh,vfenezb,vfenezf,vfenezh,vftcidb,vftcisb,vistrb,vistrf,\
 vistrh,vllezb,vllezf,vllezg,vllezh,vllezlf,vlrepb,vlrepf,vlrepg,vlreph,vlvgp,\
 vpklsfs,vpklsgs,vpklshs,vpksfs,vpksgs,vpkshs,vslb,vsrab,vsrlb,wfcdb,wfcexbs,\
-wfchexbs,wfchxbs,wfcsb")) "nothing")
+wfchexbs,wfchxbs,wfcsb,kebr,kdb,kdbr,wfkdb,wfksb")) "nothing")
 
 (define_insn_reservation "z13_3" 3
   (and (eq_attr "cpu" "z13")
@@ -232,7 +233,7 @@ wfmdb,wfmsb,wfmsdb,wfmssb,wfmsxb,wfmxb,wfsdb,wfssb,wfsxb,wldeb,wledb")) "nothing
   (and (eq_attr "cpu" "z13")
 (eq_attr "mnemonic" "adtr,cdtr,fidtr,ldetr,msg,msgr,sdtr,tdcdt,tdcet,\
 vcdgb,vcdlgb,vcgdb,vclgdb,vfadb,vfasb,vfidb,vfisb,vfmadb,vfmasb,vfmdb,vfmsb,\
-vfmsdb,vfmssb,vfsdb,vfssb,vldeb,vledb")) "nothing")
+vfmsdb,vfmssb,vfsdb,vfssb,vldeb,vledb,kdtr")) "nothing")
 
 (define_insn_reservation "z13_8" 8
   (and (eq_attr "cpu" "z13")
@@ -254,7 +255,7 @@ celgbr,flogr,m,madb,maeb,maebr,mfy,ml,mlr,mr,msdb,mseb,msebr")) "nothing")
 (define_insn_reservation "z13_12" 12
   (and (eq_attr "cpu" "z13")
 (eq_attr "mnemonic" "cfdbr,cfebr,cgdbr,cgebr,clfdbr,clfebr,clgdbr,\
-clgebr,cxbr,cxtr,mlg,mlgr,tcxb,tdcxt")) "nothing")
+clgebr,cxbr,cxtr,mlg,mlgr,tcxb,tdcxt,kxbr,kxtr")) "nothing")
 
 (define_insn_reservation "z13_13" 13
   (and (eq_attr "cpu" "z13")
diff --git a/gcc/config/s390/3906.md b/gcc/config/s390/3906.md
index 8cb4565ee22..1212d8b61f1 100644
--- a/gcc/config/s390/3906.md
+++ b/gcc/config/s390/3906.md
@@ -71,7 +71,7 @@ sgh,sh,shy,sl,slb,slbg,slg,slgf,sly,sqdb,sqeb,st,stc,stcy,std,stdy,ste,\
 stey,stg,stgrl,sth,sthrl,sthy,stoc,stocg,strl,strv,strvg,strvh,sty,sy,\
 tabort,tm,tmy,vl,vlbb,vleb,vlef,vleg,vleh,vll,vllezb,vllezf,vllezg,vllezh,\
 vllezlf,vlrepb,vlrepf,vlrepg,vlreph,vlrl,vlrlr,vst,vstl,vstrl,vstrlr,x,xg,xi,\
-xiy,xy")
+xiy,xy,kdb")
  (const_int 1)] (const_int 0)))
 
 (define_attr "z14_unit_vfu" ""
@@ -113,7 +113,8 @@ wfadb,wfasb,wfaxb,wfcdb,wfcedb,wfcesb,wfcexbs,wfchdb,wfchedb,wfchesb,\
 wfchexb,wfchexbs,wfchsb,wfchxb,wfchxbs,wfcsb,wfisb,wfixb,wflcdb,wflcsb,wflcxb,\
 wflld,wflndb,wflnsb,wflnxb,wflpdb,wflpsb,wflpxb,wfmadb,wfmasb,wfmaxb,\
 wfmaxxb,wfmdb,wfminxb,wfmsb,wfmsdb,wfmssb,wfmsxb,wfmxb,wfnmaxb,wfnmsxb,wfsdb,\
-wfssb,wfsxb,wldeb,wledb")
+wfssb,wfsxb,wldeb,wledb,kebr,kdb,kdbr,kxbr,kdtr,kxtr,wfkdb,wfksb,vfkesb,\
+vfkedb,vfkhsb,vfkhdb,wfkhxb,vfkhesb,vfkhedb,wfkhexb")
  (const_int 1)] (const_int 0)))
 
 (define_attr "z14_cracked" ""
@@ -135,7 +136,7 @@ stmg,stmy,tbegin,tbeginc")
 cxtr,dlgr,dlr,dr,dsgfr,dsgr,dxbr,dxtr,fixbr,fixbra,fixtr,flogr,lcxbr,lnxbr,\
 lpxbr,ltxbr,ltxtr,lxdb,lxdbr,lxdtr,lxeb,lxebr,m,madb,maeb,maebr,mfy,mg,mgrk,\
 ml,mlg,mlgr,mlr,mr,msdb,mseb,msebr,mvc,mxbr,mxtr,oc,ppa,sfpc,slb,slbg,\
-slbgr,slbr,sqxbr,sxbr,sxtr,tabort,tcxb,tdcxt,tend,xc")
+slbgr,slbr,sqxbr,sxbr,sxtr,tabort,tcxb,tdcxt,tend,xc,kxbr,kxtr")
  (const_int 1)] (const_int 0)))
 
 (define_attr "z14_endgroup" ""
@@ -192,7 +193,8 @@ vrepig,vrepih,vsb,vsbiq,vscbib,vscbif,vscbig,vscbih,vscbiq,vsegb,vsegf,vsegh,\
 vsel,vsf,vsg,vsh,vsl,vslb,vsldb,vsq,vsra,vsrab,vsrl,vsrlb,vuphb,vuphf,\
 vuphh,vuplb,vuplf,vuplhb,vuplhf,vuplhh,vuplhw,vupllb,vupllf,vupllh,vx,vzero,\
 wfcedb,wfcesb,wfchdb,wfchedb,wfchesb,wfchexb,wfchsb,wfchxb,wflcdb,wflcsb,\
-wflcxb,wflndb,wflnsb,wflnxb,wflpdb,wflpsb,wflpxb,wfmaxxb,wfminxb,xi,xiy")) "nothing")
+wflcxb,wflndb,wflnsb,wflnxb,wflpdb,wflpsb,wflpxb,wfmaxxb,wfminxb,xi,xiy,\
+vfkesb,vfkedb,vfkhsb,vfkhdb,wfkhxb,vfkhesb,vfkhedb,wfkhexb")) "nothing")
 
 (define_insn_reservation "z14_2" 2
   (and (eq_attr "cpu" "z14")
@@ -204,7 +206,7 @@ vchlhs,vfcedbs,vfcesbs,vfchdbs,vfchedbs,vfchesbs,vfchsbs,vfeeb,vfeef,vfeeh,\
 vfeneb,vfenef,vfeneh,vfenezb,vfenezf,vfenezh,vftcidb,vftcisb,vistrb,vistrf,\
 vistrh,vlgvf,vlgvg,vlgvh,vllezb,vllezf,vllezg,vllezh,vllezlf,vlrepb,vlrepf,\
 vlrepg,vlreph,vlrl,vlvgp,vpklsfs,vpklsgs,vpklshs,vpksfs,vpksgs,vpkshs,wfcdb,\
-wfcexbs,wfchexbs,wfchxbs,wfcsb")) "nothing")
+wfcexbs,wfchexbs,wfchxbs,wfcsb,kebr,kdb,kdbr,wfkdb,wfksb")) "nothing")
 
 (define_insn_reservation "z14_3" 3
   (and (eq_attr "cpu" "z14")
@@ -238,7 +240,8 @@ wfmasb,wfmdb,wfmsb,wfmsdb,wfmssb,wfsdb,wfssb,wldeb,wledb")) "nothing")
 (define_insn_reservation "z14_7" 7
   (and (eq_attr "cpu" "z14")
 (eq_attr "mnemonic" "adtr,cdtr,fidtr,ldetr,msgrkc,sdtr,tdcdt,tdcet,\
-vfasb,vfisb,vfmasb,vfmsb,vfmssb,vfnmssb,vfssb,vgef,vgeg,wflld")) "nothing")
+vfasb,vfisb,vfmasb,vfmsb,vfmssb,vfnmssb,vfssb,vgef,vgeg,wflld,kdtr"))
+"nothing")
 
 (define_insn_reservation "z14_8" 8
   (and (eq_attr "cpu" "z14")
@@ -261,7 +264,7 @@ celgbr,madb,maeb,maebr,msdb,mseb,msebr,vscef,vsceg")) "nothing")
 (define_insn_reservation "z14_12" 12
   (and (eq_attr "cpu" "z14")
 (eq_attr "mnemonic" "cfdbr,cfebr,cgdbr,cgebr,clfdbr,clfebr,clgdbr,\
-clgebr,cxbr,cxtr,tcxb,tdcxt")) "nothing")
+clgebr,cxbr,cxtr,tcxb,tdcxt,kxbr,kxtr")) "nothing")
 
 (define_insn_reservation "z14_13" 13
   (and (eq_attr "cpu" "z14")
diff --git a/gcc/config/s390/8561.md b/gcc/config/s390/8561.md
index e5a345f4dba..91a92b6bc5c 100644
--- a/gcc/config/s390/8561.md
+++ b/gcc/config/s390/8561.md
@@ -70,7 +70,7 @@ sar,sdb,seb,sfpc,sg,sgf,sgh,sh,shy,sl,slb,slbg,slg,slgf,sly,sqdb,sqeb,st,\
 stc,stcy,std,stdy,ste,stey,stg,stgrl,sth,sthrl,sthy,stoc,stocg,strl,strv,\
 strvg,strvh,sty,sy,tabort,tm,tmy,vl,vlbb,vleb,vlef,vleg,vleh,vll,vllezb,\
 vllezf,vllezg,vllezh,vllezlf,vlrepb,vlrepf,vlrepg,vlreph,vlrl,vlrlr,vst,\
-vstef,vsteg,vstl,vstrl,vstrlr,x,xg,xi,xiy,xy")
+vstef,vsteg,vstl,vstrl,vstrlr,x,xg,xi,xiy,xy,keb,kdb")
  (const_int 1)] (const_int 0)))
 
 (define_attr "arch13_unit_vfu" ""
@@ -112,7 +112,9 @@ vupllf,vupllh,vx,vzero,wfadb,wfasb,wfaxb,wfcdb,wfcedb,wfcesb,wfcexb,wfcexbs,\
 wfchdb,wfchedb,wfchesb,wfchexb,wfchexbs,wfchsb,wfchxb,wfchxbs,wfcsb,wfidb,\
 wfisb,wfixb,wflcdb,wflcsb,wflcxb,wflld,wflndb,wflnsb,wflnxb,wflpdb,wflpsb,\
 wflpxb,wfmadb,wfmasb,wfmaxb,wfmaxxb,wfmdb,wfminxb,wfmsb,wfmsdb,wfmssb,wfmsxb,\
-wfmxb,wfnmaxb,wfnmsxb,wfsdb,wfssb,wfsxb,wldeb,wledb")
+wfmxb,wfnmaxb,wfnmsxb,wfsdb,wfssb,wfsxb,wldeb,wledb,keb,kebr,kdb,kdbr,kxbr,\
+kdtr,kxtr,wfkdb,wfksb,vfkesb,vfkedb,wfkexb,vfkhsb,vfkhdb,wfkhxb,vfkhesb,\
+vfkhedb,wfkhexb")
  (const_int 1)] (const_int 0)))
 
 (define_attr "arch13_cracked" ""
@@ -134,7 +136,7 @@ stam,stm,stmg,stmy,tbegin,tbeginc")
 cxtr,dlgr,dlr,dr,dsgfr,dsgr,dxbr,dxtr,fixbr,fixbra,fixtr,flogr,lcxbr,lnxbr,\
 lpxbr,ltxbr,ltxtr,lxdb,lxdbr,lxdtr,lxeb,lxebr,m,madb,maeb,maebr,mfy,mg,mgrk,\
 ml,mlg,mlgr,mlr,mr,msdb,mseb,msebr,mvc,mxbr,mxtr,nc,oc,ppa,sfpc,slb,slbg,\
-slbgr,slbr,sqxbr,sxbr,sxtr,tabort,tcxb,tdcxt,tend,xc")
+slbgr,slbr,sqxbr,sxbr,sxtr,tabort,tcxb,tdcxt,tend,xc,kxbr,kxtr")
  (const_int 1)] (const_int 0)))
 
 (define_attr "arch13_endgroup" ""
@@ -194,7 +196,8 @@ vsel,vsf,vsg,vsh,vsl,vslb,vsldb,vsq,vsra,vsrab,vsrl,vsrlb,vuphb,vuphf,\
 vuphh,vuplb,vuplf,vuplhb,vuplhf,vuplhh,vuplhw,vupllb,vupllf,vupllh,vx,vzero,\
 wfcedb,wfcesb,wfcexb,wfchdb,wfchedb,wfchesb,wfchexb,wfchsb,wfchxb,wflcdb,\
 wflcsb,wflcxb,wflndb,wflnsb,wflnxb,wflpdb,wflpsb,wflpxb,wfmaxxb,wfminxb,xi,\
-xiy")) "nothing")
+xiy,vfkesb,vfkedb,wfkexb,vfkhsb,vfkhdb,wfkhxb,vfkhesb,vfkhedb,wfkhexb"))
+"nothing")
 
 (define_insn_reservation "arch13_2" 2
   (and (eq_attr "cpu" "arch13")
@@ -206,7 +209,8 @@ vchlhs,vfcedbs,vfcesbs,vfchdbs,vfchedbs,vfchesbs,vfchsbs,vfeeb,vfeef,vfeeh,\
 vfeneb,vfenef,vfeneh,vfenezb,vfenezf,vfenezh,vftcidb,vftcisb,vistrb,vistrf,\
 vistrh,vlgvb,vlgvf,vlgvg,vlgvh,vllezb,vllezf,vllezg,vllezh,vllezlf,vlrepb,\
 vlrepf,vlrepg,vlreph,vlrl,vlvgp,vpklsfs,vpklsgs,vpklshs,vpksfs,vpksgs,vpkshs,\
-wfcdb,wfcexbs,wfchexbs,wfchxbs,wfcsb")) "nothing")
+wfcdb,wfcexbs,wfchexbs,wfchxbs,wfcsb,keb,kebr,kdb,kdbr,wfkdb,wfksb"))
+"nothing")
 
 (define_insn_reservation "arch13_3" 3
   (and (eq_attr "cpu" "arch13")
@@ -240,7 +244,7 @@ wfmasb,wfmdb,wfmsb,wfmsdb,wfmssb,wfsdb,wfssb,wldeb,wledb")) "nothing")
 (define_insn_reservation "arch13_7" 7
   (and (eq_attr "cpu" "arch13")
 (eq_attr "mnemonic" "adtr,cdtr,fidtr,ldetr,ltdtr,msgrkc,sdtr,tdcdt,\
-tdcet,vgef,vgeg")) "nothing")
+tdcet,vgef,vgeg,kdtr")) "nothing")
 
 (define_insn_reservation "arch13_8" 8
   (and (eq_attr "cpu" "arch13")
@@ -263,7 +267,8 @@ clgebr,mg,mgrk,mlg,mlgr")) "nothing")
 
 (define_insn_reservation "arch13_12" 12
   (and (eq_attr "cpu" "arch13")
-(eq_attr "mnemonic" "cxbr,cxftr,cxlftr,cxtr,tcxb,tdcxt")) "nothing")
+(eq_attr "mnemonic" "cxbr,cxftr,cxlftr,cxtr,tcxb,tdcxt,kxbr,kxtr"))
+"nothing")
 
 (define_insn_reservation "arch13_13" 13
   (and (eq_attr "cpu" "arch13")
diff --git a/gcc/config/s390/s390-builtins.def b/gcc/config/s390/s390-builtins.def
index cfc69651b0d..013cac0206a 100644
--- a/gcc/config/s390/s390-builtins.def
+++ b/gcc/config/s390/s390-builtins.def
@@ -1495,8 +1495,8 @@ B_DEF      (vec_cmpgev4si,              vec_cmpgev4si,      0,
 B_DEF      (vec_cmpgeuv4si,             vec_cmpgeuv4si,     0,                  B_INT | B_VX,       0,                  BT_FN_V4SI_UV4SI_UV4SI)
 B_DEF      (vec_cmpgev2di,              vec_cmpgev2di,      0,                  B_INT | B_VX,       0,                  BT_FN_V2DI_UV2DI_UV2DI)
 B_DEF      (vec_cmpgeuv2di,             vec_cmpgeuv2di,     0,                  B_INT | B_VX,       0,                  BT_FN_V2DI_UV2DI_UV2DI)
-B_DEF      (s390_vfchesb,               vec_cmpgev4sf,      0,                  B_VXE,              0,                  BT_FN_V4SI_V4SF_V4SF)
-B_DEF      (s390_vfchedb,               vec_cmpgev2df,      0,                  B_VX,               0,                  BT_FN_V2DI_V2DF_V2DF)
+B_DEF      (s390_vfchesb,               vec_cmpgev4sf_quiet_nocc,0,             B_VXE,              0,                  BT_FN_V4SI_V4SF_V4SF)
+B_DEF      (s390_vfchedb,               vec_cmpgev2df_quiet_nocc,0,             B_VX,               0,                  BT_FN_V2DI_V2DF_V2DF)
 
 OB_DEF     (s390_vec_cmpgt,             s390_vec_cmpgt_s8,  s390_vec_cmpgt_dbl, B_VX,               BT_FN_OV4SI_OV4SI_OV4SI)
 OB_DEF_VAR (s390_vec_cmpgt_s8,          s390_vchb,          0,                  0,                  BT_OV_BV16QI_V16QI_V16QI)
@@ -1518,8 +1518,8 @@ B_DEF      (s390_vchf,                  vec_cmpgtv4si,      0,
 B_DEF      (s390_vchlf,                 vec_cmpgtuv4si,     0,                  B_VX,               0,                  BT_FN_V4SI_UV4SI_UV4SI)
 B_DEF      (s390_vchg,                  vec_cmpgtv2di,      0,                  B_VX,               0,                  BT_FN_V2DI_V2DI_V2DI)
 B_DEF      (s390_vchlg,                 vec_cmpgtuv2di,     0,                  B_VX,               0,                  BT_FN_V2DI_UV2DI_UV2DI)
-B_DEF      (s390_vfchsb,                vec_cmpgtv4sf,      0,                  B_VXE,              0,                  BT_FN_V4SI_V4SF_V4SF)
-B_DEF      (s390_vfchdb,                vec_cmpgtv2df,      0,                  B_VX,               0,                  BT_FN_V2DI_V2DF_V2DF)
+B_DEF      (s390_vfchsb,                vec_cmpgtv4sf_quiet_nocc,0,             B_VXE,              0,                  BT_FN_V4SI_V4SF_V4SF)
+B_DEF      (s390_vfchdb,                vec_cmpgtv2df_quiet_nocc,0,             B_VX,               0,                  BT_FN_V2DI_V2DF_V2DF)
 
 OB_DEF     (s390_vec_cmple,             s390_vec_cmple_s8,  s390_vec_cmple_dbl, B_VX,               BT_FN_OV4SI_OV4SI_OV4SI)
 OB_DEF_VAR (s390_vec_cmple_s8,          vec_cmplev16qi,     0,                  0,                  BT_OV_BV16QI_V16QI_V16QI)
@@ -1541,8 +1541,8 @@ B_DEF      (vec_cmplev4si,              vec_cmplev4si,      0,
 B_DEF      (vec_cmpleuv4si,             vec_cmpleuv4si,     0,                  B_INT | B_VX,       0,                  BT_FN_V4SI_UV4SI_UV4SI)
 B_DEF      (vec_cmplev2di,              vec_cmplev2di,      0,                  B_INT | B_VX,       0,                  BT_FN_V2DI_UV2DI_UV2DI)
 B_DEF      (vec_cmpleuv2di,             vec_cmpleuv2di,     0,                  B_INT | B_VX,       0,                  BT_FN_V2DI_UV2DI_UV2DI)
-B_DEF      (vec_cmplev4sf,              vec_cmplev4sf,      0,                  B_INT | B_VXE,      0,                  BT_FN_V4SI_V4SF_V4SF)
-B_DEF      (vec_cmplev2df,              vec_cmplev2df,      0,                  B_INT | B_VX,       0,                  BT_FN_V2DI_V2DF_V2DF)
+B_DEF      (vec_cmplev4sf,              vec_cmplev4sf_quiet_nocc,0,             B_INT | B_VXE,      0,                  BT_FN_V4SI_V4SF_V4SF)
+B_DEF      (vec_cmplev2df,              vec_cmplev2df_quiet_nocc,0,             B_INT | B_VX,       0,                  BT_FN_V2DI_V2DF_V2DF)
 
 OB_DEF     (s390_vec_cmplt,             s390_vec_cmplt_s8,  s390_vec_cmplt_dbl, B_VX,               BT_FN_OV4SI_OV4SI_OV4SI)
 OB_DEF_VAR (s390_vec_cmplt_s8,          vec_cmpltv16qi,     0,                  0,                  BT_OV_BV16QI_V16QI_V16QI)
@@ -1564,8 +1564,8 @@ B_DEF      (vec_cmpltv4si,              vec_cmpltv4si,      0,
 B_DEF      (vec_cmpltuv4si,             vec_cmpltuv4si,     0,                  B_INT | B_VX,       0,                  BT_FN_V4SI_UV4SI_UV4SI)
 B_DEF      (vec_cmpltv2di,              vec_cmpltv2di,      0,                  B_INT | B_VX,       0,                  BT_FN_V2DI_UV2DI_UV2DI)
 B_DEF      (vec_cmpltuv2di,             vec_cmpltuv2di,     0,                  B_INT | B_VX,       0,                  BT_FN_V2DI_UV2DI_UV2DI)
-B_DEF      (vec_cmpltv4sf,              vec_cmpltv4sf,      0,                  B_INT | B_VXE,      0,                  BT_FN_V4SI_V4SF_V4SF)
-B_DEF      (vec_cmpltv2df,              vec_cmpltv2df,      0,                  B_INT | B_VX,       0,                  BT_FN_V2DI_V2DF_V2DF)
+B_DEF      (vec_cmpltv4sf,              vec_cmpltv4sf_quiet_nocc,0,             B_INT | B_VXE,      0,                  BT_FN_V4SI_V4SF_V4SF)
+B_DEF      (vec_cmpltv2df,              vec_cmpltv2df_quiet_nocc,0,             B_INT | B_VX,       0,                  BT_FN_V2DI_V2DF_V2DF)
 
 OB_DEF     (s390_vec_cntlz,             s390_vec_cntlz_s8,  s390_vec_cntlz_u64, B_VX,               BT_FN_OV4SI_OV4SI)
 OB_DEF_VAR (s390_vec_cntlz_s8,          s390_vclzb,         0,                  0,                  BT_OV_UV16QI_V16QI)
diff --git a/gcc/config/s390/s390-modes.def b/gcc/config/s390/s390-modes.def
index 7b7c1141449..2d9cd9b5945 100644
--- a/gcc/config/s390/s390-modes.def
+++ b/gcc/config/s390/s390-modes.def
@@ -52,6 +52,8 @@ CCS:  EQ          LT           GT          UNORDERED  (LTGFR, LTGR, LTR, ICM/Y,
                                                        ADB/R, AEB/R, SDB/R, SEB/R,
                                                        SRAG, SRA, SRDA)
 CCSR: EQ          GT           LT          UNORDERED  (CGF/R, CH/Y)
+CCSFPS: EQ        GT           LT          UNORDERED  (KEB/R, KDB/R, KXBR, KDTR,
+						       KXTR, WFK)
 
 Condition codes resulting from add with overflow
 
@@ -140,6 +142,11 @@ around. The following both modes can be considered as CCS and CCU modes with
 exchanged operands.
 
 
+CCSFPS
+
+This mode is used for signaling rtxes: LT, LE, GT, GE and LTGT.
+
+
 CCL1, CCL2
 
 These modes represent the result of overflow checks.
@@ -226,6 +233,7 @@ CC_MODE (CCU);
 CC_MODE (CCUR);
 CC_MODE (CCS);
 CC_MODE (CCSR);
+CC_MODE (CCSFPS);
 CC_MODE (CCT);
 CC_MODE (CCT1);
 CC_MODE (CCT2);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index f9817f6edaf..9fe1131bd98 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -1376,6 +1376,7 @@ s390_match_ccmode_set (rtx set, machine_mode req_mode)
     case E_CCZ1mode:
     case E_CCSmode:
     case E_CCSRmode:
+    case E_CCSFPSmode:
     case E_CCUmode:
     case E_CCURmode:
     case E_CCOmode:
@@ -1559,6 +1560,12 @@ s390_select_ccmode (enum rtx_code code, rtx op0, rtx op1)
 	    else
 	      return CCAPmode;
 	  }
+
+	/* Fall through.  */
+      case LTGT:
+	if (HONOR_NANS (op0) || HONOR_NANS (op1))
+	  return CCSFPSmode;
+
 	/* Fall through.  */
       case UNORDERED:
       case ORDERED:
@@ -1567,7 +1574,6 @@ s390_select_ccmode (enum rtx_code code, rtx op0, rtx op1)
       case UNLT:
       case UNGE:
       case UNGT:
-      case LTGT:
 	if ((GET_CODE (op0) == SIGN_EXTEND || GET_CODE (op0) == ZERO_EXTEND)
 	    && GET_CODE (op1) != CONST_INT)
 	  return CCSRmode;
@@ -2082,6 +2088,7 @@ s390_branch_condition_mask (rtx code)
       break;
 
     case E_CCSmode:
+    case E_CCSFPSmode:
       switch (GET_CODE (code))
 	{
 	case EQ:	return CC0;
@@ -6504,18 +6511,23 @@ s390_expand_vec_compare (rtx target, enum rtx_code cond,
 	{
 	  /* NE a != b -> !(a == b) */
 	case NE:   cond = EQ; neg_p = true;                break;
-	  /* UNGT a u> b -> !(b >= a) */
-	case UNGT: cond = GE; neg_p = true; swap_p = true; break;
-	  /* UNGE a u>= b -> !(b > a) */
-	case UNGE: cond = GT; neg_p = true; swap_p = true; break;
-	  /* LE: a <= b -> b >= a */
+	case UNGT:
+	  emit_insn (gen_vec_cmpungt (target, cmp_op1, cmp_op2));
+	  return;
+	case UNGE:
+	  emit_insn (gen_vec_cmpunge (target, cmp_op1, cmp_op2));
+	  return;
 	case LE:   cond = GE;               swap_p = true; break;
-	  /* UNLE: a u<= b -> !(a > b) */
-	case UNLE: cond = GT; neg_p = true;                break;
+	  /* UNLE: (a u<= b) -> (b u>= a).  */
+	case UNLE:
+	  emit_insn (gen_vec_cmpunge (target, cmp_op2, cmp_op1));
+	  return;
 	  /* LT: a < b -> b > a */
 	case LT:   cond = GT;               swap_p = true; break;
-	  /* UNLT: a u< b -> !(a >= b) */
-	case UNLT: cond = GE; neg_p = true;                break;
+	  /* UNLT: (a u< b) -> (b u> a).  */
+	case UNLT:
+	  emit_insn (gen_vec_cmpungt (target, cmp_op2, cmp_op1));
+	  return;
 	case UNEQ:
 	  emit_insn (gen_vec_cmpuneq (target, cmp_op1, cmp_op2));
 	  return;
@@ -6678,7 +6690,7 @@ s390_reverse_condition (machine_mode mode, enum rtx_code code)
 {
   /* Reversal of FP compares takes care -- an ordered compare
      becomes an unordered compare and vice versa.  */
-  if (mode == CCVFALLmode || mode == CCVFANYmode)
+  if (mode == CCVFALLmode || mode == CCVFANYmode || mode == CCSFPSmode)
     return reverse_condition_maybe_unordered (code);
   else if (mode == CCVIALLmode || mode == CCVIANYmode)
     return reverse_condition (code);
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index e4516f6c378..bf3e051dbae 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -1424,6 +1424,20 @@
    (set_attr "cpu_facility" "*,*,vx,vxe")
    (set_attr "enabled" "*,<DSF>,<DF>,<SF>")])
 
+(define_insn "*cmp<mode>_ccsfps"
+  [(set (reg CC_REGNUM)
+	(compare (match_operand:FP 0 "register_operand" "f,f,v,v")
+		 (match_operand:FP 1 "general_operand"  "f,R,v,v")))]
+  "s390_match_ccmode (insn, CCSFPSmode) && TARGET_HARD_FLOAT"
+  "@
+   k<xde><bt>r\t%0,%1
+   k<xde>b\t%0,%1
+   wfkdb\t%0,%1
+   wfksb\t%0,%1"
+  [(set_attr "op_type" "RRE,RXE,VRR,VRR")
+   (set_attr "cpu_facility" "*,*,vx,vxe")
+   (set_attr "enabled" "*,<DSF>,<DF>,<SF>")])
+
 ; Compare and Branch instructions
 
 ; cij, cgij, crj, cgrj, cfi, cgfi, cr, cgr
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index a093ae5c565..7f33d34e726 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -168,10 +168,6 @@
 (define_mode_attr vec_halfnumelts
   [(V4SF "V2SF") (V4SI "V2SI")])
 
-; The comparisons not setting CC iterate over the rtx code.
-(define_code_iterator VFCMP_HW_OP [eq gt ge])
-(define_code_attr asm_fcmp_op [(eq "e") (gt "h") (ge "he")])
-
 
 
 ; Comparison operators on int and fp compares which are directly
@@ -1392,7 +1388,8 @@
   "#"
   "&& 1"
   [(set (match_dup 3)
-	(gt:V2DI (match_dup 1) (match_dup 2)))
+	(not:V2DI
+	 (unge:V2DI (match_dup 2) (match_dup 1))))
    (set (match_dup 0)
 	(if_then_else:V2DF
 	 (eq (match_dup 3) (match_dup 4))
@@ -1427,7 +1424,8 @@
   "#"
   "&& 1"
   [(set (match_dup 3)
-	(gt:V2DI (match_dup 1) (match_dup 2)))
+	(not:V2DI
+	 (unge:V2DI (match_dup 2) (match_dup 1))))
    (set (match_dup 0)
 	(if_then_else:V2DF
 	 (eq (match_dup 3) (match_dup 4))
@@ -1481,27 +1479,134 @@
 ;; Floating point compares
 ;;
 
-; EQ, GT, GE
-; vfcesb, vfcedb, wfcexb, vfchsb, vfchdb, wfchxb, vfchesb, vfchedb, wfchexb
-(define_insn "*vec_cmp<VFCMP_HW_OP:code><mode>_nocc"
-  [(set (match_operand:<tointvec>                  0 "register_operand" "=v")
-	(VFCMP_HW_OP:<tointvec> (match_operand:VFT 1 "register_operand"  "v")
-			     (match_operand:VFT 2 "register_operand"  "v")))]
-   "TARGET_VX"
-   "<vw>fc<VFCMP_HW_OP:asm_fcmp_op><sdx>b\t%v0,%v1,%v2"
+; vfcesb, vfcedb, wfcexb: non-signaling "==" comparison (a == b)
+(define_insn "*vec_cmpeq<mode>_quiet_nocc"
+  [(set (match_operand:<tointvec>         0 "register_operand" "=v")
+	(eq:<tointvec> (match_operand:VFT 1 "register_operand" "v")
+		       (match_operand:VFT 2 "register_operand" "v")))]
+  "TARGET_VX"
+  "<vw>fce<sdx>b\t%v0,%v1,%v2"
+  [(set_attr "op_type" "VRR")])
+
+; vfchsb, vfchdb, wfchxb: non-signaling > comparison (!(b u>= a))
+(define_insn "vec_cmpgt<mode>_quiet_nocc"
+  [(set (match_operand:<tointvec>            0 "register_operand" "=v")
+	(not:<tointvec>
+	 (unge:<tointvec> (match_operand:VFT 2 "register_operand" "v")
+			  (match_operand:VFT 1 "register_operand" "v"))))]
+  "TARGET_VX"
+  "<vw>fch<sdx>b\t%v0,%v1,%v2"
+  [(set_attr "op_type" "VRR")])
+
+(define_expand "vec_cmplt<mode>_quiet_nocc"
+  [(set (match_operand:<tointvec>            0 "register_operand" "=v")
+	(not:<tointvec>
+	 (unge:<tointvec> (match_operand:VFT 1 "register_operand" "v")
+			  (match_operand:VFT 2 "register_operand" "v"))))]
+  "TARGET_VX")
+
+; vfchesb, vfchedb, wfchexb: non-signaling >= comparison (!(a u< b))
+(define_insn "vec_cmpge<mode>_quiet_nocc"
+  [(set (match_operand:<tointvec>            0 "register_operand" "=v")
+	(not:<tointvec>
+	 (unlt:<tointvec> (match_operand:VFT 1 "register_operand" "v")
+			  (match_operand:VFT 2 "register_operand" "v"))))]
+  "TARGET_VX"
+  "<vw>fche<sdx>b\t%v0,%v1,%v2"
+  [(set_attr "op_type" "VRR")])
+
+(define_expand "vec_cmple<mode>_quiet_nocc"
+  [(set (match_operand:<tointvec>            0 "register_operand" "=v")
+	(not:<tointvec>
+	 (unlt:<tointvec> (match_operand:VFT 2 "register_operand" "v")
+			  (match_operand:VFT 1 "register_operand" "v"))))]
+  "TARGET_VX")
+
+; vfkesb, vfkedb, wfkexb: signaling == comparison ((a >= b) & (b >= a))
+(define_insn "*vec_cmpeq<mode>_signaling_nocc"
+  [(set (match_operand:<tointvec>          0 "register_operand" "=v")
+	(and:<tointvec>
+	 (ge:<tointvec> (match_operand:VFT 1 "register_operand" "v")
+			(match_operand:VFT 2 "register_operand" "v"))
+	 (ge:<tointvec> (match_dup         2)
+			(match_dup         1))))]
+  "TARGET_VXE"
+  "<vw>fke<sdx>b\t%v0,%v1,%v2"
+  [(set_attr "op_type" "VRR")])
+
+; vfkhsb, vfkhdb, wfkhxb: signaling > comparison (a > b)
+(define_insn "*vec_cmpgt<mode>_signaling_nocc"
+  [(set (match_operand:<tointvec>         0 "register_operand" "=v")
+	(gt:<tointvec> (match_operand:VFT 1 "register_operand" "v")
+		       (match_operand:VFT 2 "register_operand" "v")))]
+  "TARGET_VXE"
+  "<vw>fkh<sdx>b\t%v0,%v1,%v2"
+  [(set_attr "op_type" "VRR")])
+
+(define_insn "*vec_cmpgt<mode>_signaling_finite_nocc"
+  [(set (match_operand:<tointvec>         0 "register_operand" "=v")
+	(gt:<tointvec> (match_operand:VFT 1 "register_operand" "v")
+		       (match_operand:VFT 2 "register_operand" "v")))]
+  "TARGET_VX && !TARGET_VXE && flag_finite_math_only"
+  "<vw>fch<sdx>b\t%v0,%v1,%v2"
+  [(set_attr "op_type" "VRR")])
+
+; vfkhesb, vfkhedb, wfkhexb: signaling >= comparison (a >= b)
+(define_insn "*vec_cmpge<mode>_signaling_nocc"
+  [(set (match_operand:<tointvec>         0 "register_operand" "=v")
+	(ge:<tointvec> (match_operand:VFT 1 "register_operand" "v")
+		       (match_operand:VFT 2 "register_operand" "v")))]
+  "TARGET_VXE"
+  "<vw>fkhe<sdx>b\t%v0,%v1,%v2"
+  [(set_attr "op_type" "VRR")])
+
+(define_insn "*vec_cmpge<mode>_signaling_finite_nocc"
+  [(set (match_operand:<tointvec>         0 "register_operand" "=v")
+	(ge:<tointvec> (match_operand:VFT 1 "register_operand" "v")
+		       (match_operand:VFT 2 "register_operand" "v")))]
+  "TARGET_VX && !TARGET_VXE && flag_finite_math_only"
+  "<vw>fche<sdx>b\t%v0,%v1,%v2"
   [(set_attr "op_type" "VRR")])
 
 ; Expanders for not directly supported comparisons
+; Signaling comparisons must be expressed via signaling rtxes only,
+; and quiet comparisons must be expressed via quiet rtxes only.
+
+; UNGT a u> b -> !!(b u< a)
+(define_expand "vec_cmpungt<mode>"
+  [(set (match_operand:<tointvec>            0 "register_operand" "=v")
+	(not:<tointvec>
+	 (unlt:<tointvec> (match_operand:VFT 2 "register_operand" "v")
+			  (match_operand:VFT 1 "register_operand" "v"))))
+   (set (match_dup                           0)
+	(not:<tointvec> (match_dup           0)))]
+  "TARGET_VX")
 
-; UNEQ a u== b -> !(a > b | b > a)
+; UNGE a u>= b -> !!(a u>= b)
+(define_expand "vec_cmpunge<mode>"
+  [(set (match_operand:<tointvec>            0 "register_operand" "=v")
+	(not:<tointvec>
+	 (unge:<tointvec> (match_operand:VFT 1 "register_operand" "v")
+			  (match_operand:VFT 2 "register_operand" "v"))))
+   (set (match_dup                           0)
+	(not:<tointvec> (match_dup           0)))]
+  "TARGET_VX")
+
+; UNEQ a u== b -> !(!(a u>= b) | !(b u>= a))
 (define_expand "vec_cmpuneq<mode>"
-  [(set (match_operand:<tointvec>         0 "register_operand" "=v")
-	(gt:<tointvec> (match_operand:VFT 1 "register_operand"  "v")
-		    (match_operand:VFT 2 "register_operand"  "v")))
-   (set (match_dup 3)
-	(gt:<tointvec> (match_dup 2) (match_dup 1)))
-   (set (match_dup 0) (ior:<tointvec> (match_dup 0) (match_dup 3)))
-   (set (match_dup 0) (not:<tointvec> (match_dup 0)))]
+  [(set (match_operand:<tointvec>            0 "register_operand" "=v")
+	(not:<tointvec>
+	 (unge:<tointvec> (match_operand:VFT 1 "register_operand"  "v")
+		          (match_operand:VFT 2 "register_operand"  "v"))))
+   (set (match_dup                           3)
+	(not:<tointvec>
+	 (unge:<tointvec> (match_dup         2)
+	                  (match_dup         1))))
+   (set (match_dup                           0)
+	(ior:<tointvec> (match_dup           0)
+			(match_dup           3)))
+   (set (match_dup                           0)
+	(not:<tointvec> (match_dup           0)))]
   "TARGET_VX"
 {
   operands[3] = gen_reg_rtx (<tointvec>mode);
@@ -1514,18 +1619,24 @@
 		    (match_operand:VFT 2 "register_operand"  "v")))
    (set (match_dup 3) (gt:<tointvec> (match_dup 2) (match_dup 1)))
    (set (match_dup 0) (ior:<tointvec> (match_dup 0) (match_dup 3)))]
-  "TARGET_VX"
+  "TARGET_VXE"
 {
   operands[3] = gen_reg_rtx (<tointvec>mode);
 })
 
-; ORDERED (a, b): a >= b | b > a
+; ORDERED (a, b): !(a u< b) | !(a u>= b)
 (define_expand "vec_cmpordered<mode>"
-  [(set (match_operand:<tointvec>          0 "register_operand" "=v")
-	(ge:<tointvec> (match_operand:VFT 1 "register_operand"  "v")
-		 (match_operand:VFT 2 "register_operand"  "v")))
-   (set (match_dup 3) (gt:<tointvec> (match_dup 2) (match_dup 1)))
-   (set (match_dup 0) (ior:<tointvec> (match_dup 0) (match_dup 3)))]
+  [(set (match_operand:<tointvec>            0 "register_operand" "=v")
+	(not:<tointvec>
+	 (unlt:<tointvec> (match_operand:VFT 1 "register_operand" "v")
+		          (match_operand:VFT 2 "register_operand" "v"))))
+   (set (match_dup                           3)
+	(not:<tointvec>
+	 (unge:<tointvec> (match_dup         1)
+			  (match_dup         2))))
+   (set (match_dup                           0)
+	(ior:<tointvec> (match_dup           0)
+			(match_dup           3)))]
   "TARGET_VX"
 {
   operands[3] = gen_reg_rtx (<tointvec>mode);
@@ -1545,7 +1656,7 @@
 })
 
 (define_code_iterator VEC_CODE_WITH_COMPLEX_EXPAND
-  [uneq ltgt ordered unordered])
+  [ungt unge uneq ltgt ordered unordered])
 
 (define_expand "vec_cmp<code>"
   [(match_operand 0 "register_operand" "")
diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-scalar-cmp-1.c b/gcc/testsuite/gcc.target/s390/vector/vec-scalar-cmp-1.c
index ea51d0c86af..073d574aa5e 100644
--- a/gcc/testsuite/gcc.target/s390/vector/vec-scalar-cmp-1.c
+++ b/gcc/testsuite/gcc.target/s390/vector/vec-scalar-cmp-1.c
@@ -34,7 +34,7 @@ gt (double a, double b)
   return a > b;
 }
 
-/* { dg-final { scan-assembler "gt:\n\[^:\]*\twfcdb\t%v\[0-9\]*,%v\[0-9\]*\n\t\[^:\]+\tlochinh\t%r2,0" } } */
+/* { dg-final { scan-assembler "gt:\n\[^:\]*\twfkdb\t%v\[0-9\]*,%v\[0-9\]*\n\t\[^:\]+\tlochinh\t%r2,0" } } */
 
 int
 ge (double a, double b)
@@ -45,7 +45,7 @@ ge (double a, double b)
   return a >= b;
 }
 
-/* { dg-final { scan-assembler "ge:\n\[^:\]*\twfcdb\t%v\[0-9\]*,%v\[0-9\]*\n\t\[^:\]+\tlochinhe\t%r2,0" } } */
+/* { dg-final { scan-assembler "ge:\n\[^:\]*\twfkdb\t%v\[0-9\]*,%v\[0-9\]*\n\t\[^:\]+\tlochinhe\t%r2,0" } } */
 
 int
 lt (double a, double b)
@@ -56,7 +56,7 @@ lt (double a, double b)
   return a < b;
 }
 
-/* { dg-final { scan-assembler "lt:\n\[^:\]*\twfcdb\t%v\[0-9\]*,%v\[0-9\]*\n\t\[^:\]+\tlochinl\t%r2,0" } } */
+/* { dg-final { scan-assembler "lt:\n\[^:\]*\twfkdb\t%v\[0-9\]*,%v\[0-9\]*\n\t\[^:\]+\tlochinl\t%r2,0" } } */
 
 int
 le (double a, double b)
@@ -67,4 +67,4 @@ le (double a, double b)
   return a <= b;
 }
 
-/* { dg-final { scan-assembler "le:\n\[^:\]*\twfcdb\t%v\[0-9\]*,%v\[0-9\]*\n\t\[^:\]+\tlochinle\t%r2,0" } } */
+/* { dg-final { scan-assembler "le:\n\[^:\]*\twfkdb\t%v\[0-9\]*,%v\[0-9\]*\n\t\[^:\]+\tlochinle\t%r2,0" } } */
-- 
2.21.0

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 9/9] S/390: Test signaling FP comparison instructions
  2019-08-22 13:47 [PATCH v2 0/9] S/390: Use signaling FP comparison instructions Ilya Leoshkevich
                   ` (7 preceding siblings ...)
  2019-08-22 14:17 ` [PATCH v2 8/9] S/390: Use signaling FP comparison instructions Ilya Leoshkevich
@ 2019-08-22 14:26 ` Ilya Leoshkevich
  2019-08-29 16:08 ` [PATCH v2 0/9] S/390: Use " Ilya Leoshkevich
  9 siblings, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-08-22 14:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, segher, Ilya Leoshkevich

gcc/testsuite/ChangeLog:

2019-08-09  Ilya Leoshkevich  <iii@linux.ibm.com>

	* gcc.target/s390/s390.exp: Enable Fortran tests.
	* gcc.target/s390/zvector/autovec-double-quiet-eq.c: New test.
	* gcc.target/s390/zvector/autovec-double-quiet-ge.c: New test.
	* gcc.target/s390/zvector/autovec-double-quiet-gt.c: New test.
	* gcc.target/s390/zvector/autovec-double-quiet-le.c: New test.
	* gcc.target/s390/zvector/autovec-double-quiet-lt.c: New test.
	* gcc.target/s390/zvector/autovec-double-quiet-ordered.c: New test.
	* gcc.target/s390/zvector/autovec-double-quiet-uneq.c: New test.
	* gcc.target/s390/zvector/autovec-double-quiet-unordered.c: New test.
	* gcc.target/s390/zvector/autovec-double-signaling-eq-z13-finite.c: New test.
	* gcc.target/s390/zvector/autovec-double-signaling-eq-z13.c: New test.
	* gcc.target/s390/zvector/autovec-double-signaling-eq.c: New test.
	* gcc.target/s390/zvector/autovec-double-signaling-ge-z13-finite.c: New test.
	* gcc.target/s390/zvector/autovec-double-signaling-ge-z13.c: New test.
	* gcc.target/s390/zvector/autovec-double-signaling-ge.c: New test.
	* gcc.target/s390/zvector/autovec-double-signaling-gt-z13-finite.c: New test.
	* gcc.target/s390/zvector/autovec-double-signaling-gt-z13.c: New test.
	* gcc.target/s390/zvector/autovec-double-signaling-gt.c: New test.
	* gcc.target/s390/zvector/autovec-double-signaling-le-z13-finite.c: New test.
	* gcc.target/s390/zvector/autovec-double-signaling-le-z13.c: New test.
	* gcc.target/s390/zvector/autovec-double-signaling-le.c: New test.
	* gcc.target/s390/zvector/autovec-double-signaling-lt-z13-finite.c: New test.
	* gcc.target/s390/zvector/autovec-double-signaling-lt-z13.c: New test.
	* gcc.target/s390/zvector/autovec-double-signaling-lt.c: New test.
	* gcc.target/s390/zvector/autovec-double-signaling-ltgt-z13-finite.c: New test.
	* gcc.target/s390/zvector/autovec-double-signaling-ltgt-z13.c: New test.
	* gcc.target/s390/zvector/autovec-double-signaling-ltgt.c: New test.
	* gcc.target/s390/zvector/autovec-double-smax-z13.F90: New test.
	* gcc.target/s390/zvector/autovec-double-smax.F90: New test.
	* gcc.target/s390/zvector/autovec-double-smin-z13.F90: New test.
	* gcc.target/s390/zvector/autovec-double-smin.F90: New test.
	* gcc.target/s390/zvector/autovec-float-quiet-eq.c: New test.
	* gcc.target/s390/zvector/autovec-float-quiet-ge.c: New test.
	* gcc.target/s390/zvector/autovec-float-quiet-gt.c: New test.
	* gcc.target/s390/zvector/autovec-float-quiet-le.c: New test.
	* gcc.target/s390/zvector/autovec-float-quiet-lt.c: New test.
	* gcc.target/s390/zvector/autovec-float-quiet-ordered.c: New test.
	* gcc.target/s390/zvector/autovec-float-quiet-uneq.c: New test.
	* gcc.target/s390/zvector/autovec-float-quiet-unordered.c: New test.
	* gcc.target/s390/zvector/autovec-float-signaling-eq.c: New test.
	* gcc.target/s390/zvector/autovec-float-signaling-ge.c: New test.
	* gcc.target/s390/zvector/autovec-float-signaling-gt.c: New test.
	* gcc.target/s390/zvector/autovec-float-signaling-le.c: New test.
	* gcc.target/s390/zvector/autovec-float-signaling-lt.c: New test.
	* gcc.target/s390/zvector/autovec-float-signaling-ltgt.c: New test.
	* gcc.target/s390/zvector/autovec-fortran.h: New test.
	* gcc.target/s390/zvector/autovec-long-double-signaling-ge.c: New test.
	* gcc.target/s390/zvector/autovec-long-double-signaling-gt.c: New test.
	* gcc.target/s390/zvector/autovec-long-double-signaling-le.c: New test.
	* gcc.target/s390/zvector/autovec-long-double-signaling-lt.c: New test.
	* gcc.target/s390/zvector/autovec.h: New test.
---
 gcc/testsuite/gcc.target/s390/s390.exp        |  8 ++++
 .../s390/zvector/autovec-double-quiet-eq.c    |  8 ++++
 .../s390/zvector/autovec-double-quiet-ge.c    |  8 ++++
 .../s390/zvector/autovec-double-quiet-gt.c    |  8 ++++
 .../s390/zvector/autovec-double-quiet-le.c    |  8 ++++
 .../s390/zvector/autovec-double-quiet-lt.c    |  8 ++++
 .../zvector/autovec-double-quiet-ordered.c    | 10 +++++
 .../s390/zvector/autovec-double-quiet-uneq.c  | 10 +++++
 .../zvector/autovec-double-quiet-unordered.c  | 11 +++++
 .../autovec-double-signaling-eq-z13-finite.c  | 10 +++++
 .../zvector/autovec-double-signaling-eq-z13.c |  9 ++++
 .../zvector/autovec-double-signaling-eq.c     | 11 +++++
 .../autovec-double-signaling-ge-z13-finite.c  | 10 +++++
 .../zvector/autovec-double-signaling-ge-z13.c |  9 ++++
 .../zvector/autovec-double-signaling-ge.c     |  8 ++++
 .../autovec-double-signaling-gt-z13-finite.c  | 10 +++++
 .../zvector/autovec-double-signaling-gt-z13.c |  9 ++++
 .../zvector/autovec-double-signaling-gt.c     |  8 ++++
 .../autovec-double-signaling-le-z13-finite.c  | 10 +++++
 .../zvector/autovec-double-signaling-le-z13.c |  9 ++++
 .../zvector/autovec-double-signaling-le.c     |  8 ++++
 .../autovec-double-signaling-lt-z13-finite.c  | 10 +++++
 .../zvector/autovec-double-signaling-lt-z13.c |  9 ++++
 .../zvector/autovec-double-signaling-lt.c     |  8 ++++
 ...autovec-double-signaling-ltgt-z13-finite.c |  9 ++++
 .../autovec-double-signaling-ltgt-z13.c       |  9 ++++
 .../zvector/autovec-double-signaling-ltgt.c   |  9 ++++
 .../s390/zvector/autovec-double-smax-z13.F90  | 11 +++++
 .../s390/zvector/autovec-double-smax.F90      |  8 ++++
 .../s390/zvector/autovec-double-smin-z13.F90  | 11 +++++
 .../s390/zvector/autovec-double-smin.F90      |  8 ++++
 .../s390/zvector/autovec-float-quiet-eq.c     |  8 ++++
 .../s390/zvector/autovec-float-quiet-ge.c     |  8 ++++
 .../s390/zvector/autovec-float-quiet-gt.c     |  8 ++++
 .../s390/zvector/autovec-float-quiet-le.c     |  8 ++++
 .../s390/zvector/autovec-float-quiet-lt.c     |  8 ++++
 .../zvector/autovec-float-quiet-ordered.c     | 10 +++++
 .../s390/zvector/autovec-float-quiet-uneq.c   | 10 +++++
 .../zvector/autovec-float-quiet-unordered.c   | 11 +++++
 .../s390/zvector/autovec-float-signaling-eq.c | 11 +++++
 .../s390/zvector/autovec-float-signaling-ge.c |  8 ++++
 .../s390/zvector/autovec-float-signaling-gt.c |  8 ++++
 .../s390/zvector/autovec-float-signaling-le.c |  8 ++++
 .../s390/zvector/autovec-float-signaling-lt.c |  8 ++++
 .../zvector/autovec-float-signaling-ltgt.c    |  9 ++++
 .../gcc.target/s390/zvector/autovec-fortran.h |  7 ++++
 .../autovec-long-double-signaling-ge.c        |  8 ++++
 .../autovec-long-double-signaling-gt.c        |  8 ++++
 .../autovec-long-double-signaling-le.c        |  8 ++++
 .../autovec-long-double-signaling-lt.c        |  8 ++++
 .../gcc.target/s390/zvector/autovec.h         | 41 +++++++++++++++++++
 51 files changed, 485 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-eq.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-ge.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-gt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-le.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-lt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-ordered.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-uneq.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-unordered.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq-z13-finite.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq-z13.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ge-z13-finite.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ge-z13.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ge.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-gt-z13-finite.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-gt-z13.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-gt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-le-z13-finite.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-le-z13.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-le.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-lt-z13-finite.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-lt-z13.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-lt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ltgt-z13-finite.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ltgt-z13.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ltgt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-smax-z13.F90
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-smax.F90
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-smin-z13.F90
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-double-smin.F90
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-eq.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-ge.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-gt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-le.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-lt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-ordered.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-uneq.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-unordered.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-ge.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-gt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-le.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-lt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-ltgt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-fortran.h
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-ge.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-gt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-le.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-lt.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/autovec.h

diff --git a/gcc/testsuite/gcc.target/s390/s390.exp b/gcc/testsuite/gcc.target/s390/s390.exp
index 86f7e4398eb..925eb568832 100644
--- a/gcc/testsuite/gcc.target/s390/s390.exp
+++ b/gcc/testsuite/gcc.target/s390/s390.exp
@@ -27,6 +27,7 @@ if ![istarget s390*-*-*] then {
 # Load support procs.
 load_lib gcc-dg.exp
 load_lib target-supports.exp
+load_lib gfortran-dg.exp
 
 # Return 1 if the the assembler understands .machine and .machinemode.  The
 # target attribute needs that feature to work.
@@ -193,6 +194,10 @@ global DEFAULT_CFLAGS
 if ![info exists DEFAULT_CFLAGS] then {
     set DEFAULT_CFLAGS " -ansi -pedantic-errors"
 }
+global DEFAULT_FFLAGS
+if ![info exists DEFAULT_FFLAGS] then {
+    set DEFAULT_FFLAGS " -pedantic-errors"
+}
 
 # Initialize `dg'.
 dg-init
@@ -209,6 +214,9 @@ dg-runtest [lsort [prune [glob -nocomplain $srcdir/$subdir/*.{c,S}] \
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*vector*/*.{c,S}]] \
 	"" $DEFAULT_CFLAGS
 
+gfortran-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*vector*/*.F90]] \
+	"" $DEFAULT_FFLAGS
+
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/target-attribute/*.{c,S}]] \
 	"" $DEFAULT_CFLAGS
 
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-eq.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-eq.c
new file mode 100644
index 00000000000..dad138770c8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-eq.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (QUIET_EQ);
+
+/* { dg-final { scan-assembler {\n\tvfcedb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-ge.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-ge.c
new file mode 100644
index 00000000000..9fddb62573f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-ge.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (QUIET_GE);
+
+/* { dg-final { scan-assembler {\n\tvfchedb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-gt.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-gt.c
new file mode 100644
index 00000000000..eb512f84c47
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-gt.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (QUIET_GT);
+
+/* { dg-final { scan-assembler {\n\tvfchdb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-le.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-le.c
new file mode 100644
index 00000000000..c049f8b7dee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-le.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (QUIET_LE);
+
+/* { dg-final { scan-assembler {\n\tvfchedb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-lt.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-lt.c
new file mode 100644
index 00000000000..b6f7702ecd0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-lt.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (QUIET_LT);
+
+/* { dg-final { scan-assembler {\n\tvfchdb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-ordered.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-ordered.c
new file mode 100644
index 00000000000..bf8ebd4ab6a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-ordered.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (QUIET_ORDERED);
+
+/* { dg-final { scan-assembler {\n\tvfchedb\t} } } */
+/* { dg-final { scan-assembler {\n\tvfchdb\t} } } */
+/* { dg-final { scan-assembler {\n\tvo\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-uneq.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-uneq.c
new file mode 100644
index 00000000000..421fb5e7ba5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-uneq.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (QUIET_UNEQ);
+
+/* { dg-final { scan-assembler-times {\n\tvfchdb\t} 2 } } */
+/* { dg-final { scan-assembler {\n\tvo\t} } } */
+/* { dg-final { scan-assembler {\n\tvx\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-unordered.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-unordered.c
new file mode 100644
index 00000000000..c42f7930ad8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-unordered.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (QUIET_UNORDERED);
+
+/* { dg-final { scan-assembler {\n\tvfchedb\t} } } */
+/* { dg-final { scan-assembler {\n\tvfchdb\t} } } */
+/* combine prefers to reorder vsel args instead of using vno.  */
+/* { dg-final { scan-assembler {\n\tvo\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq-z13-finite.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq-z13-finite.c
new file mode 100644
index 00000000000..e3d42eaf3ad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq-z13-finite.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector -ffinite-math-only" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (SIGNALING_EQ);
+
+/* We can use non-signaling vector comparison instructions with
+   -ffinite-math-only.  */
+/* { dg-final { scan-assembler {\n\tvfcedb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq-z13.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq-z13.c
new file mode 100644
index 00000000000..f6110328891
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq-z13.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (SIGNALING_EQ);
+
+/* z13 does not have signaling vector comparison instructions.  */
+/* { dg-final { scan-assembler {\n\tkdbr\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c
new file mode 100644
index 00000000000..32088cb55b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (SIGNALING_EQ);
+
+/* The vectorizer produces <= and ==, which rtl passes cannot turn into vfkedb
+   yet.  */
+/* { dg-final { scan-assembler {\n\tvfcedb\t} } } */
+/* { dg-final { scan-assembler {\n\tvfkhedb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ge-z13-finite.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ge-z13-finite.c
new file mode 100644
index 00000000000..b301d1b739b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ge-z13-finite.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector -ffinite-math-only" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (SIGNALING_GE);
+
+/* We can use non-signaling vector comparison instructions with
+   -ffinite-math-only.  */
+/* { dg-final { scan-assembler {\n\tvfchedb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ge-z13.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ge-z13.c
new file mode 100644
index 00000000000..ee83f3405c8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ge-z13.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (SIGNALING_GE);
+
+/* z13 does not have signaling vector comparison instructions.  */
+/* { dg-final { scan-assembler {\n\tkdb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ge.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ge.c
new file mode 100644
index 00000000000..bcb4c868a15
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ge.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (SIGNALING_GE);
+
+/* { dg-final { scan-assembler {\n\tvfkhedb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-gt-z13-finite.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-gt-z13-finite.c
new file mode 100644
index 00000000000..c49764447f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-gt-z13-finite.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector -ffinite-math-only" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (SIGNALING_GT);
+
+/* We can use non-signaling vector comparison instructions with
+   -ffinite-math-only.  */
+/* { dg-final { scan-assembler {\n\tvfchdb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-gt-z13.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-gt-z13.c
new file mode 100644
index 00000000000..6b9c11997a6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-gt-z13.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (SIGNALING_GT);
+
+/* z13 does not have signaling vector comparison instructions.  */
+/* { dg-final { scan-assembler {\n\tkdb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-gt.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-gt.c
new file mode 100644
index 00000000000..e423ed0f78c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-gt.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (SIGNALING_GT);
+
+/* { dg-final { scan-assembler {\n\tvfkhdb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-le-z13-finite.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-le-z13-finite.c
new file mode 100644
index 00000000000..7fa559b5701
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-le-z13-finite.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector -ffinite-math-only" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (SIGNALING_LE);
+
+/* We can use non-signaling vector comparison instructions with
+   -ffinite-math-only.  */
+/* { dg-final { scan-assembler {\n\tvfchedb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-le-z13.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-le-z13.c
new file mode 100644
index 00000000000..a80ac20b905
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-le-z13.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (SIGNALING_LE);
+
+/* z13 does not have signaling vector comparison instructions.  */
+/* { dg-final { scan-assembler {\n\tkdb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-le.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-le.c
new file mode 100644
index 00000000000..b97bebaaf8f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-le.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (SIGNALING_LE);
+
+/* { dg-final { scan-assembler {\n\tvfkhedb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-lt-z13-finite.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-lt-z13-finite.c
new file mode 100644
index 00000000000..3305a98379c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-lt-z13-finite.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector -ffinite-math-only" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (SIGNALING_LT);
+
+/* We can use non-signaling vector comparison instructions with
+   -ffinite-math-only.  */
+/* { dg-final { scan-assembler {\n\tvfchdb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-lt-z13.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-lt-z13.c
new file mode 100644
index 00000000000..8b398a28c37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-lt-z13.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (SIGNALING_LT);
+
+/* z13 does not have signaling vector comparison instructions.  */
+/* { dg-final { scan-assembler {\n\tkdb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-lt.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-lt.c
new file mode 100644
index 00000000000..b01272d00a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-lt.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (SIGNALING_LT);
+
+/* { dg-final { scan-assembler {\n\tvfkhdb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ltgt-z13-finite.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ltgt-z13-finite.c
new file mode 100644
index 00000000000..76730d70968
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ltgt-z13-finite.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector -ffinite-math-only" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (SIGNALING_LTGT);
+
+/* ltgt is the same as eq with -ffinite-math-only.  */
+/* { dg-final { scan-assembler {\n\tvfcedb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ltgt-z13.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ltgt-z13.c
new file mode 100644
index 00000000000..d466697499a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ltgt-z13.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z13 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (SIGNALING_LTGT);
+
+/* z13 does not have signaling vector comparison instructions.  */
+/* { dg-final { scan-assembler {\n\tkdb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ltgt.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ltgt.c
new file mode 100644
index 00000000000..645f299a9fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ltgt.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_DOUBLE (SIGNALING_LTGT);
+
+/* { dg-final { scan-assembler-times {\n\tvfkhdb\t} 2 } } */
+/* { dg-final { scan-assembler {\n\tvo\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-smax-z13.F90 b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-smax-z13.F90
new file mode 100644
index 00000000000..b114082df59
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-smax-z13.F90
@@ -0,0 +1,11 @@
+! { dg-do compile }
+! { dg-options "-ffree-line-length-256 -O3 -march=z13 -mzvector" }
+
+#include "autovec-fortran.h"
+
+AUTOVEC_FORTRAN (max)
+
+! Fortran's max does not specify whether or not an exception should be raised in
+! face of qNaNs, and neither does gcc's smax.  Vectorize max using quiet
+! comparison, because that's the only one we have on z13.
+! { dg-final { scan-assembler {\n\tvfchdb\t} } }
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-smax.F90 b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-smax.F90
new file mode 100644
index 00000000000..1698ec4f4db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-smax.F90
@@ -0,0 +1,8 @@
+! { dg-do compile }
+! { dg-options "-ffree-line-length-256 -O3 -march=z14 -mzvector" }
+
+#include "autovec-fortran.h"
+
+AUTOVEC_FORTRAN (max)
+
+! { dg-final { scan-assembler {\n\tvfmaxdb\t} } }
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-smin-z13.F90 b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-smin-z13.F90
new file mode 100644
index 00000000000..fc56e9d6879
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-smin-z13.F90
@@ -0,0 +1,11 @@
+! { dg-do compile }
+! { dg-options "-ffree-line-length-256 -O3 -march=z13 -mzvector" }
+
+#include "autovec-fortran.h"
+
+AUTOVEC_FORTRAN (min)
+
+! Fortran's min does not specify whether or not an exception should be raised in
+! face of qNaNs, and neither does gcc's smin.  Vectorize min using quiet
+! comparison, because that's the only one we have on z13.
+! { dg-final { scan-assembler {\n\tvfchdb\t} } }
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-smin.F90 b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-smin.F90
new file mode 100644
index 00000000000..0dd1a33bb84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-smin.F90
@@ -0,0 +1,8 @@
+! { dg-do compile }
+! { dg-options "-ffree-line-length-256 -O3 -march=z14 -mzvector" }
+
+#include "autovec-fortran.h"
+
+AUTOVEC_FORTRAN (min)
+
+! { dg-final { scan-assembler {\n\tvfmindb\t} } }
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-eq.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-eq.c
new file mode 100644
index 00000000000..c74927dd028
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-eq.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_FLOAT (QUIET_EQ);
+
+/* { dg-final { scan-assembler {\n\tvfcesb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-ge.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-ge.c
new file mode 100644
index 00000000000..4c7cb09eed5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-ge.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_FLOAT (QUIET_GE);
+
+/* { dg-final { scan-assembler {\n\tvfchesb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-gt.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-gt.c
new file mode 100644
index 00000000000..dd787929b9f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-gt.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_FLOAT (QUIET_GT);
+
+/* { dg-final { scan-assembler {\n\tvfchsb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-le.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-le.c
new file mode 100644
index 00000000000..5bd1e3e98e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-le.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_FLOAT (QUIET_LE);
+
+/* { dg-final { scan-assembler {\n\tvfchesb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-lt.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-lt.c
new file mode 100644
index 00000000000..4938dcfb430
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-lt.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_FLOAT (QUIET_LT);
+
+/* { dg-final { scan-assembler {\n\tvfchsb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-ordered.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-ordered.c
new file mode 100644
index 00000000000..222e9efb5f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-ordered.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_FLOAT (QUIET_UNORDERED);
+
+/* { dg-final { scan-assembler {\n\tvfchesb\t} } } */
+/* { dg-final { scan-assembler {\n\tvfchsb\t} } } */
+/* { dg-final { scan-assembler {\n\tvo\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-uneq.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-uneq.c
new file mode 100644
index 00000000000..ab5dcac9c34
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-uneq.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_FLOAT (QUIET_UNEQ);
+
+/* { dg-final { scan-assembler-times {\n\tvfchsb\t} 2 } } */
+/* { dg-final { scan-assembler {\n\tvo\t} } } */
+/* { dg-final { scan-assembler {\n\tvx\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-unordered.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-unordered.c
new file mode 100644
index 00000000000..c800dce2d7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-unordered.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_FLOAT (QUIET_UNORDERED);
+
+/* { dg-final { scan-assembler {\n\tvfchesb\t} } } */
+/* { dg-final { scan-assembler {\n\tvfchsb\t} } } */
+/* combine prefers to reorder vsel args instead of using vno.  */
+/* { dg-final { scan-assembler {\n\tvo\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c
new file mode 100644
index 00000000000..ce3271c918c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_FLOAT (SIGNALING_EQ);
+
+/* The vectorizer produces <= and ==, which rtl passes cannot turn into vfkesb
+   yet.  */
+/* { dg-final { scan-assembler {\n\tvfcesb\t} } } */
+/* { dg-final { scan-assembler {\n\tvfkhesb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-ge.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-ge.c
new file mode 100644
index 00000000000..0f98c5467e8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-ge.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_FLOAT (SIGNALING_GE);
+
+/* { dg-final { scan-assembler {\n\tvfkhesb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-gt.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-gt.c
new file mode 100644
index 00000000000..762c4c2030c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-gt.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_FLOAT (SIGNALING_GT);
+
+/* { dg-final { scan-assembler {\n\tvfkhsb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-le.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-le.c
new file mode 100644
index 00000000000..ccf0c5c24d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-le.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_FLOAT (SIGNALING_LE);
+
+/* { dg-final { scan-assembler {\n\tvfkhesb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-lt.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-lt.c
new file mode 100644
index 00000000000..b428e5fc52e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-lt.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_FLOAT (SIGNALING_LT);
+
+/* { dg-final { scan-assembler {\n\tvfkhsb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-ltgt.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-ltgt.c
new file mode 100644
index 00000000000..bf15242a4d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-ltgt.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_FLOAT (SIGNALING_LTGT);
+
+/* { dg-final { scan-assembler-times {\n\tvfkhsb\t} 2 } } */
+/* { dg-final { scan-assembler {\n\tvo\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-fortran.h b/gcc/testsuite/gcc.target/s390/zvector/autovec-fortran.h
new file mode 100644
index 00000000000..8e44cb2dd31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-fortran.h
@@ -0,0 +1,7 @@
+#define AUTOVEC_FORTRAN(OP) subroutine f (r, x, y); \
+  real(kind=kind (1.0d0)) :: r(1000000), x(1000000), y(1000000); \
+  integer :: i; \
+  do i = 1, 1000000; \
+    r(i) = OP (x(i), y(i)); \
+  end do; \
+end
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-ge.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-ge.c
new file mode 100644
index 00000000000..684a6a9b2e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-ge.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_LONG_DOUBLE (SIGNALING_GE);
+
+/* { dg-final { scan-assembler {\n\twfkhexb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-gt.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-gt.c
new file mode 100644
index 00000000000..76ade12c7f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-gt.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_LONG_DOUBLE (SIGNALING_GT);
+
+/* { dg-final { scan-assembler {\n\twfkhxb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-le.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-le.c
new file mode 100644
index 00000000000..a15960ec86a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-le.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_LONG_DOUBLE (SIGNALING_LE);
+
+/* { dg-final { scan-assembler {\n\twfkhexb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-lt.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-lt.c
new file mode 100644
index 00000000000..046d5487af8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-long-double-signaling-lt.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzvector" } */
+
+#include "autovec.h"
+
+AUTOVEC_LONG_DOUBLE (SIGNALING_LT);
+
+/* { dg-final { scan-assembler {\n\twfkhxb\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec.h b/gcc/testsuite/gcc.target/s390/zvector/autovec.h
new file mode 100644
index 00000000000..d04e5d7e00e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec.h
@@ -0,0 +1,41 @@
+#ifndef AUTOVEC_H
+#define AUTOVEC_H 1
+
+#define QUIET_EQ(x, y) ((x) == (y))
+#define QUIET_GE __builtin_isgreaterequal
+#define QUIET_GT __builtin_isgreater
+#define QUIET_LE __builtin_islessequal
+#define QUIET_LT __builtin_isless
+#define QUIET_ORDERED(x, y) (!__builtin_isunordered ((x), (y)))
+#define QUIET_UNEQ(x, y) (__builtin_isless ((x), (y)) \
+                          || __builtin_isgreater ((x), (y)))
+#define QUIET_UNORDERED __builtin_isunordered
+#define SIGNALING_EQ(x, y) (((x) <= (y)) && ((x) >= (y)))
+#define SIGNALING_GE(x, y) ((x) >= (y))
+#define SIGNALING_GT(x, y) ((x) > (y))
+#define SIGNALING_LE(x, y) ((x) <= (y))
+#define SIGNALING_LT(x, y) ((x) < (y))
+#define SIGNALING_LTGT(x, y) (((x) < (y)) || ((x) > (y)))
+
+#define AUTOVEC(RESULT_TYPE, OP_TYPE, OP) void \
+f (RESULT_TYPE *r, const OP_TYPE *x, const OP_TYPE *y) \
+{ \
+  int i; \
+\
+  for (i = 0; i < 1000000; i++) \
+    { \
+      OP_TYPE xi = x[i], yi = y[i]; \
+\
+      r[i] = OP (xi, yi); \
+    } \
+}
+
+#define AUTOVEC_DOUBLE(OP) AUTOVEC (long, double, OP)
+
+#define AUTOVEC_FLOAT(OP) AUTOVEC (int, float, OP)
+
+typedef __int128 v1ti __attribute__ ((vector_size (16)));
+typedef long double v1tf __attribute__ ((vector_size (16)));
+#define AUTOVEC_LONG_DOUBLE(OP) AUTOVEC (v1ti, v1tf, OP)
+
+#endif
-- 
2.21.0

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/9] Introduce can_vector_compare_p function
  2019-08-22 13:47 ` [PATCH v2 3/9] Introduce can_vector_compare_p function Ilya Leoshkevich
@ 2019-08-23 11:08   ` Richard Sandiford
  2019-08-23 11:39     ` Richard Biener
  2019-08-23 11:40     ` Ilya Leoshkevich
  0 siblings, 2 replies; 35+ messages in thread
From: Richard Sandiford @ 2019-08-23 11:08 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: gcc-patches, segher

Ilya Leoshkevich <iii@linux.ibm.com> writes:
> @@ -3819,6 +3820,82 @@ can_compare_p (enum rtx_code code, machine_mode mode,
>    return 0;
>  }
>  
> +/* can_vector_compare_p presents fake rtx binary operations to the the back-end
> +   in order to determine its capabilities.  In order to avoid creating fake
> +   operations on each call, values from previous calls are cached in a global
> +   cached_binops hash_table.  It contains rtxes, which can be looked up using
> +   binop_keys.  */
> +
> +struct binop_key {
> +  enum rtx_code code;        /* Operation code.  */
> +  machine_mode value_mode;   /* Result mode.     */
> +  machine_mode cmp_op_mode;  /* Operand mode.    */
> +};
> +
> +struct binop_hasher : pointer_hash_mark<rtx>, ggc_cache_remove<rtx> {
> +  typedef rtx value_type;
> +  typedef binop_key compare_type;
> +
> +  static hashval_t
> +  hash (enum rtx_code code, machine_mode value_mode, machine_mode cmp_op_mode)
> +  {
> +    inchash::hash hstate (0);
> +    hstate.add_int (code);
> +    hstate.add_int (value_mode);
> +    hstate.add_int (cmp_op_mode);
> +    return hstate.end ();
> +  }
> +
> +  static hashval_t
> +  hash (const rtx &ref)
> +  {
> +    return hash (GET_CODE (ref), GET_MODE (ref), GET_MODE (XEXP (ref, 0)));
> +  }
> +
> +  static bool
> +  equal (const rtx &ref1, const binop_key &ref2)
> +  {
> +    return (GET_CODE (ref1) == ref2.code)
> +	   && (GET_MODE (ref1) == ref2.value_mode)
> +	   && (GET_MODE (XEXP (ref1, 0)) == ref2.cmp_op_mode);
> +  }
> +};
> +
> +static GTY ((cache)) hash_table<binop_hasher> *cached_binops;
> +
> +static rtx
> +get_cached_binop (enum rtx_code code, machine_mode value_mode,
> +		  machine_mode cmp_op_mode)
> +{
> +  if (!cached_binops)
> +    cached_binops = hash_table<binop_hasher>::create_ggc (1024);
> +  binop_key key = { code, value_mode, cmp_op_mode };
> +  hashval_t hash = binop_hasher::hash (code, value_mode, cmp_op_mode);
> +  rtx *slot = cached_binops->find_slot_with_hash (key, hash, INSERT);
> +  if (!*slot)
> +    *slot = gen_rtx_fmt_ee (code, value_mode, gen_reg_rtx (cmp_op_mode),
> +			    gen_reg_rtx (cmp_op_mode));
> +  return *slot;
> +}

Sorry, I didn't mean anything this complicated.  I just meant that
we should have a single cached rtx that we can change via PUT_CODE and
PUT_MODE_RAW for each new query, rather than allocating a new rtx each
time.

Something like:

static GTY ((cache)) rtx cached_binop;

rtx
get_cached_binop (machine_mode mode, rtx_code code, machine_mode op_mode)
{
  if (cached_binop)
    {
      PUT_CODE (cached_binop, code);
      PUT_MODE_RAW (cached_binop, mode);
      PUT_MODE_RAW (XEXP (cached_binop, 0), op_mode);
      PUT_MODE_RAW (XEXP (cached_binop, 1), op_mode);
    }
  else
    {
      rtx reg1 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 1);
      rtx reg2 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 2);
      cached_binop = gen_rtx_fmt_ee (code, mode, reg1, reg2);
    }
  return cached_binop;
}

Thanks,
Richard

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/9] Introduce can_vector_compare_p function
  2019-08-23 11:08   ` Richard Sandiford
@ 2019-08-23 11:39     ` Richard Biener
  2019-08-23 11:43       ` Ilya Leoshkevich
  2019-08-23 11:40     ` Ilya Leoshkevich
  1 sibling, 1 reply; 35+ messages in thread
From: Richard Biener @ 2019-08-23 11:39 UTC (permalink / raw)
  To: Ilya Leoshkevich, GCC Patches, Segher Boessenkool, Richard Sandiford

On Fri, Aug 23, 2019 at 12:43 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Ilya Leoshkevich <iii@linux.ibm.com> writes:
> > @@ -3819,6 +3820,82 @@ can_compare_p (enum rtx_code code, machine_mode mode,
> >    return 0;
> >  }
> >
> > +/* can_vector_compare_p presents fake rtx binary operations to the the back-end
> > +   in order to determine its capabilities.  In order to avoid creating fake
> > +   operations on each call, values from previous calls are cached in a global
> > +   cached_binops hash_table.  It contains rtxes, which can be looked up using
> > +   binop_keys.  */
> > +
> > +struct binop_key {
> > +  enum rtx_code code;        /* Operation code.  */
> > +  machine_mode value_mode;   /* Result mode.     */
> > +  machine_mode cmp_op_mode;  /* Operand mode.    */
> > +};
> > +
> > +struct binop_hasher : pointer_hash_mark<rtx>, ggc_cache_remove<rtx> {
> > +  typedef rtx value_type;
> > +  typedef binop_key compare_type;
> > +
> > +  static hashval_t
> > +  hash (enum rtx_code code, machine_mode value_mode, machine_mode cmp_op_mode)
> > +  {
> > +    inchash::hash hstate (0);
> > +    hstate.add_int (code);
> > +    hstate.add_int (value_mode);
> > +    hstate.add_int (cmp_op_mode);
> > +    return hstate.end ();
> > +  }
> > +
> > +  static hashval_t
> > +  hash (const rtx &ref)
> > +  {
> > +    return hash (GET_CODE (ref), GET_MODE (ref), GET_MODE (XEXP (ref, 0)));
> > +  }
> > +
> > +  static bool
> > +  equal (const rtx &ref1, const binop_key &ref2)
> > +  {
> > +    return (GET_CODE (ref1) == ref2.code)
> > +        && (GET_MODE (ref1) == ref2.value_mode)
> > +        && (GET_MODE (XEXP (ref1, 0)) == ref2.cmp_op_mode);
> > +  }
> > +};
> > +
> > +static GTY ((cache)) hash_table<binop_hasher> *cached_binops;
> > +
> > +static rtx
> > +get_cached_binop (enum rtx_code code, machine_mode value_mode,
> > +               machine_mode cmp_op_mode)
> > +{
> > +  if (!cached_binops)
> > +    cached_binops = hash_table<binop_hasher>::create_ggc (1024);
> > +  binop_key key = { code, value_mode, cmp_op_mode };
> > +  hashval_t hash = binop_hasher::hash (code, value_mode, cmp_op_mode);
> > +  rtx *slot = cached_binops->find_slot_with_hash (key, hash, INSERT);
> > +  if (!*slot)
> > +    *slot = gen_rtx_fmt_ee (code, value_mode, gen_reg_rtx (cmp_op_mode),
> > +                         gen_reg_rtx (cmp_op_mode));
> > +  return *slot;
> > +}
>
> Sorry, I didn't mean anything this complicated.  I just meant that
> we should have a single cached rtx that we can change via PUT_CODE and
> PUT_MODE_RAW for each new query, rather than allocating a new rtx each
> time.
>
> Something like:
>
> static GTY ((cache)) rtx cached_binop;
>
> rtx
> get_cached_binop (machine_mode mode, rtx_code code, machine_mode op_mode)
> {
>   if (cached_binop)
>     {
>       PUT_CODE (cached_binop, code);
>       PUT_MODE_RAW (cached_binop, mode);
>       PUT_MODE_RAW (XEXP (cached_binop, 0), op_mode);
>       PUT_MODE_RAW (XEXP (cached_binop, 1), op_mode);
>     }
>   else
>     {
>       rtx reg1 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 1);
>       rtx reg2 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 2);
>       cached_binop = gen_rtx_fmt_ee (code, mode, reg1, reg2);
>     }
>   return cached_binop;
> }

Hmm, maybe we need  auto_rtx (code) that constructs such
RTX on the stack instead of wasting a GC root (and causing
issues for future threading of GCC ;)).

Richard.

>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/9] Introduce can_vector_compare_p function
  2019-08-23 11:08   ` Richard Sandiford
  2019-08-23 11:39     ` Richard Biener
@ 2019-08-23 11:40     ` Ilya Leoshkevich
  2019-08-23 14:23       ` Richard Sandiford
  1 sibling, 1 reply; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-08-23 11:40 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, segher

> Am 23.08.2019 um 12:43 schrieb Richard Sandiford <richard.sandiford@arm.com>:
> 
> Ilya Leoshkevich <iii@linux.ibm.com> writes:
>> @@ -3819,6 +3820,82 @@ can_compare_p (enum rtx_code code, machine_mode mode,
>>   return 0;
>> }
>> 
>> +/* can_vector_compare_p presents fake rtx binary operations to the the back-end
>> +   in order to determine its capabilities.  In order to avoid creating fake
>> +   operations on each call, values from previous calls are cached in a global
>> +   cached_binops hash_table.  It contains rtxes, which can be looked up using
>> +   binop_keys.  */
>> +
>> +struct binop_key {
>> +  enum rtx_code code;        /* Operation code.  */
>> +  machine_mode value_mode;   /* Result mode.     */
>> +  machine_mode cmp_op_mode;  /* Operand mode.    */
>> +};
>> +
>> +struct binop_hasher : pointer_hash_mark<rtx>, ggc_cache_remove<rtx> {
>> +  typedef rtx value_type;
>> +  typedef binop_key compare_type;
>> +
>> +  static hashval_t
>> +  hash (enum rtx_code code, machine_mode value_mode, machine_mode cmp_op_mode)
>> +  {
>> +    inchash::hash hstate (0);
>> +    hstate.add_int (code);
>> +    hstate.add_int (value_mode);
>> +    hstate.add_int (cmp_op_mode);
>> +    return hstate.end ();
>> +  }
>> +
>> +  static hashval_t
>> +  hash (const rtx &ref)
>> +  {
>> +    return hash (GET_CODE (ref), GET_MODE (ref), GET_MODE (XEXP (ref, 0)));
>> +  }
>> +
>> +  static bool
>> +  equal (const rtx &ref1, const binop_key &ref2)
>> +  {
>> +    return (GET_CODE (ref1) == ref2.code)
>> +	   && (GET_MODE (ref1) == ref2.value_mode)
>> +	   && (GET_MODE (XEXP (ref1, 0)) == ref2.cmp_op_mode);
>> +  }
>> +};
>> +
>> +static GTY ((cache)) hash_table<binop_hasher> *cached_binops;
>> +
>> +static rtx
>> +get_cached_binop (enum rtx_code code, machine_mode value_mode,
>> +		  machine_mode cmp_op_mode)
>> +{
>> +  if (!cached_binops)
>> +    cached_binops = hash_table<binop_hasher>::create_ggc (1024);
>> +  binop_key key = { code, value_mode, cmp_op_mode };
>> +  hashval_t hash = binop_hasher::hash (code, value_mode, cmp_op_mode);
>> +  rtx *slot = cached_binops->find_slot_with_hash (key, hash, INSERT);
>> +  if (!*slot)
>> +    *slot = gen_rtx_fmt_ee (code, value_mode, gen_reg_rtx (cmp_op_mode),
>> +			    gen_reg_rtx (cmp_op_mode));
>> +  return *slot;
>> +}
> 
> Sorry, I didn't mean anything this complicated.  I just meant that
> we should have a single cached rtx that we can change via PUT_CODE and
> PUT_MODE_RAW for each new query, rather than allocating a new rtx each
> time.
> 
> Something like:
> 
> static GTY ((cache)) rtx cached_binop;
> 
> rtx
> get_cached_binop (machine_mode mode, rtx_code code, machine_mode op_mode)
> {
>  if (cached_binop)
>    {
>      PUT_CODE (cached_binop, code);
>      PUT_MODE_RAW (cached_binop, mode);
>      PUT_MODE_RAW (XEXP (cached_binop, 0), op_mode);
>      PUT_MODE_RAW (XEXP (cached_binop, 1), op_mode);
>    }
>  else
>    {
>      rtx reg1 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 1);
>      rtx reg2 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 2);
>      cached_binop = gen_rtx_fmt_ee (code, mode, reg1, reg2);
>    }
>  return cached_binop;
> }
> 
> Thanks,
> Richard

Oh, I must have completely missed the point: the cache is only for
storage, and stored values themselves don't really matter.

To make rtx usable with GTY ((cache)) I had to do:

--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -4427,4 +4427,9 @@ extern void gt_ggc_mx (rtx &);
 extern void gt_pch_nx (rtx &);
 extern void gt_pch_nx (rtx &, gt_pointer_operator, void *);

+inline void
+gt_cleare_cache (rtx)
+{
+}
+
 #endif /* ! GCC_RTL_H */

Does that look ok?

Another think that might turn out being important: in your first
suggestion you use

 if (insn_operand_predicate_fn pred = insn_data[icode].operand[3].predicate)
   {
     machine_mode cmp_mode = insn_data[icode].operand[3].mode;

instead of simply insn_operand_matches - is there any difference?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/9] Introduce can_vector_compare_p function
  2019-08-23 11:39     ` Richard Biener
@ 2019-08-23 11:43       ` Ilya Leoshkevich
  2019-08-26 10:04         ` Richard Biener
  0 siblings, 1 reply; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-08-23 11:43 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Segher Boessenkool, Richard Sandiford

> Am 23.08.2019 um 13:24 schrieb Richard Biener <richard.guenther@gmail.com>:
> 
> On Fri, Aug 23, 2019 at 12:43 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>> 
>> Ilya Leoshkevich <iii@linux.ibm.com> writes:
>>> @@ -3819,6 +3820,82 @@ can_compare_p (enum rtx_code code, machine_mode mode,
>>>   return 0;
>>> }
>>> 
>>> +/* can_vector_compare_p presents fake rtx binary operations to the the back-end
>>> +   in order to determine its capabilities.  In order to avoid creating fake
>>> +   operations on each call, values from previous calls are cached in a global
>>> +   cached_binops hash_table.  It contains rtxes, which can be looked up using
>>> +   binop_keys.  */
>>> +
>>> +struct binop_key {
>>> +  enum rtx_code code;        /* Operation code.  */
>>> +  machine_mode value_mode;   /* Result mode.     */
>>> +  machine_mode cmp_op_mode;  /* Operand mode.    */
>>> +};
>>> +
>>> +struct binop_hasher : pointer_hash_mark<rtx>, ggc_cache_remove<rtx> {
>>> +  typedef rtx value_type;
>>> +  typedef binop_key compare_type;
>>> +
>>> +  static hashval_t
>>> +  hash (enum rtx_code code, machine_mode value_mode, machine_mode cmp_op_mode)
>>> +  {
>>> +    inchash::hash hstate (0);
>>> +    hstate.add_int (code);
>>> +    hstate.add_int (value_mode);
>>> +    hstate.add_int (cmp_op_mode);
>>> +    return hstate.end ();
>>> +  }
>>> +
>>> +  static hashval_t
>>> +  hash (const rtx &ref)
>>> +  {
>>> +    return hash (GET_CODE (ref), GET_MODE (ref), GET_MODE (XEXP (ref, 0)));
>>> +  }
>>> +
>>> +  static bool
>>> +  equal (const rtx &ref1, const binop_key &ref2)
>>> +  {
>>> +    return (GET_CODE (ref1) == ref2.code)
>>> +        && (GET_MODE (ref1) == ref2.value_mode)
>>> +        && (GET_MODE (XEXP (ref1, 0)) == ref2.cmp_op_mode);
>>> +  }
>>> +};
>>> +
>>> +static GTY ((cache)) hash_table<binop_hasher> *cached_binops;
>>> +
>>> +static rtx
>>> +get_cached_binop (enum rtx_code code, machine_mode value_mode,
>>> +               machine_mode cmp_op_mode)
>>> +{
>>> +  if (!cached_binops)
>>> +    cached_binops = hash_table<binop_hasher>::create_ggc (1024);
>>> +  binop_key key = { code, value_mode, cmp_op_mode };
>>> +  hashval_t hash = binop_hasher::hash (code, value_mode, cmp_op_mode);
>>> +  rtx *slot = cached_binops->find_slot_with_hash (key, hash, INSERT);
>>> +  if (!*slot)
>>> +    *slot = gen_rtx_fmt_ee (code, value_mode, gen_reg_rtx (cmp_op_mode),
>>> +                         gen_reg_rtx (cmp_op_mode));
>>> +  return *slot;
>>> +}
>> 
>> Sorry, I didn't mean anything this complicated.  I just meant that
>> we should have a single cached rtx that we can change via PUT_CODE and
>> PUT_MODE_RAW for each new query, rather than allocating a new rtx each
>> time.
>> 
>> Something like:
>> 
>> static GTY ((cache)) rtx cached_binop;
>> 
>> rtx
>> get_cached_binop (machine_mode mode, rtx_code code, machine_mode op_mode)
>> {
>>  if (cached_binop)
>>    {
>>      PUT_CODE (cached_binop, code);
>>      PUT_MODE_RAW (cached_binop, mode);
>>      PUT_MODE_RAW (XEXP (cached_binop, 0), op_mode);
>>      PUT_MODE_RAW (XEXP (cached_binop, 1), op_mode);
>>    }
>>  else
>>    {
>>      rtx reg1 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 1);
>>      rtx reg2 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 2);
>>      cached_binop = gen_rtx_fmt_ee (code, mode, reg1, reg2);
>>    }
>>  return cached_binop;
>> }
> 
> Hmm, maybe we need  auto_rtx (code) that constructs such
> RTX on the stack instead of wasting a GC root (and causing
> issues for future threading of GCC ;)).

Do you mean something like this?

union {
  char raw[rtx_code_size[code]];
  rtx rtx;
} binop;

Does this exist already (git grep auto.*rtx / rtx.*auto doesn't show
anything useful), or should I implement this?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/9] Introduce can_vector_compare_p function
  2019-08-23 11:40     ` Ilya Leoshkevich
@ 2019-08-23 14:23       ` Richard Sandiford
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Sandiford @ 2019-08-23 14:23 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: gcc-patches, segher

Ilya Leoshkevich <iii@linux.ibm.com> writes:
>> Am 23.08.2019 um 12:43 schrieb Richard Sandiford <richard.sandiford@arm.com>:
>> 
>> Ilya Leoshkevich <iii@linux.ibm.com> writes:
>>> @@ -3819,6 +3820,82 @@ can_compare_p (enum rtx_code code, machine_mode mode,
>>>   return 0;
>>> }
>>> 
>>> +/* can_vector_compare_p presents fake rtx binary operations to the the back-end
>>> +   in order to determine its capabilities.  In order to avoid creating fake
>>> +   operations on each call, values from previous calls are cached in a global
>>> +   cached_binops hash_table.  It contains rtxes, which can be looked up using
>>> +   binop_keys.  */
>>> +
>>> +struct binop_key {
>>> +  enum rtx_code code;        /* Operation code.  */
>>> +  machine_mode value_mode;   /* Result mode.     */
>>> +  machine_mode cmp_op_mode;  /* Operand mode.    */
>>> +};
>>> +
>>> +struct binop_hasher : pointer_hash_mark<rtx>, ggc_cache_remove<rtx> {
>>> +  typedef rtx value_type;
>>> +  typedef binop_key compare_type;
>>> +
>>> +  static hashval_t
>>> +  hash (enum rtx_code code, machine_mode value_mode, machine_mode cmp_op_mode)
>>> +  {
>>> +    inchash::hash hstate (0);
>>> +    hstate.add_int (code);
>>> +    hstate.add_int (value_mode);
>>> +    hstate.add_int (cmp_op_mode);
>>> +    return hstate.end ();
>>> +  }
>>> +
>>> +  static hashval_t
>>> +  hash (const rtx &ref)
>>> +  {
>>> +    return hash (GET_CODE (ref), GET_MODE (ref), GET_MODE (XEXP (ref, 0)));
>>> +  }
>>> +
>>> +  static bool
>>> +  equal (const rtx &ref1, const binop_key &ref2)
>>> +  {
>>> +    return (GET_CODE (ref1) == ref2.code)
>>> +	   && (GET_MODE (ref1) == ref2.value_mode)
>>> +	   && (GET_MODE (XEXP (ref1, 0)) == ref2.cmp_op_mode);
>>> +  }
>>> +};
>>> +
>>> +static GTY ((cache)) hash_table<binop_hasher> *cached_binops;
>>> +
>>> +static rtx
>>> +get_cached_binop (enum rtx_code code, machine_mode value_mode,
>>> +		  machine_mode cmp_op_mode)
>>> +{
>>> +  if (!cached_binops)
>>> +    cached_binops = hash_table<binop_hasher>::create_ggc (1024);
>>> +  binop_key key = { code, value_mode, cmp_op_mode };
>>> +  hashval_t hash = binop_hasher::hash (code, value_mode, cmp_op_mode);
>>> +  rtx *slot = cached_binops->find_slot_with_hash (key, hash, INSERT);
>>> +  if (!*slot)
>>> +    *slot = gen_rtx_fmt_ee (code, value_mode, gen_reg_rtx (cmp_op_mode),
>>> +			    gen_reg_rtx (cmp_op_mode));
>>> +  return *slot;
>>> +}
>> 
>> Sorry, I didn't mean anything this complicated.  I just meant that
>> we should have a single cached rtx that we can change via PUT_CODE and
>> PUT_MODE_RAW for each new query, rather than allocating a new rtx each
>> time.
>> 
>> Something like:
>> 
>> static GTY ((cache)) rtx cached_binop;
>> 
>> rtx
>> get_cached_binop (machine_mode mode, rtx_code code, machine_mode op_mode)
>> {
>>  if (cached_binop)
>>    {
>>      PUT_CODE (cached_binop, code);
>>      PUT_MODE_RAW (cached_binop, mode);
>>      PUT_MODE_RAW (XEXP (cached_binop, 0), op_mode);
>>      PUT_MODE_RAW (XEXP (cached_binop, 1), op_mode);
>>    }
>>  else
>>    {
>>      rtx reg1 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 1);
>>      rtx reg2 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 2);
>>      cached_binop = gen_rtx_fmt_ee (code, mode, reg1, reg2);
>>    }
>>  return cached_binop;
>> }
>> 
>> Thanks,
>> Richard
>
> Oh, I must have completely missed the point: the cache is only for
> storage, and stored values themselves don't really matter.
>
> To make rtx usable with GTY ((cache)) I had to do:
>
> --- a/gcc/rtl.h
> +++ b/gcc/rtl.h
> @@ -4427,4 +4427,9 @@ extern void gt_ggc_mx (rtx &);
>  extern void gt_pch_nx (rtx &);
>  extern void gt_pch_nx (rtx &, gt_pointer_operator, void *);
>
> +inline void
> +gt_cleare_cache (rtx)
> +{
> +}
> +
>  #endif /* ! GCC_RTL_H */
>
> Does that look ok?

Ah, turns out I was thinking of "deletable" rather than "cache", sorry.

> Another think that might turn out being important: in your first
> suggestion you use
>
>  if (insn_operand_predicate_fn pred = insn_data[icode].operand[3].predicate)
>    {
>      machine_mode cmp_mode = insn_data[icode].operand[3].mode;
>
> instead of simply insn_operand_matches - is there any difference?

I guess it was premature optimisation: if the .md file doesn't specify a
predicate, we don't even need to create the rtx.  But since most targets
probably do specify a predicate, using insn_matches is fine too.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/9] Introduce can_vector_compare_p function
  2019-08-23 11:43       ` Ilya Leoshkevich
@ 2019-08-26 10:04         ` Richard Biener
  2019-08-26 12:18           ` Ilya Leoshkevich
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Biener @ 2019-08-26 10:04 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: GCC Patches, Segher Boessenkool, Richard Sandiford

On Fri, Aug 23, 2019 at 1:35 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> > Am 23.08.2019 um 13:24 schrieb Richard Biener <richard.guenther@gmail.com>:
> >
> > On Fri, Aug 23, 2019 at 12:43 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Ilya Leoshkevich <iii@linux.ibm.com> writes:
> >>> @@ -3819,6 +3820,82 @@ can_compare_p (enum rtx_code code, machine_mode mode,
> >>>   return 0;
> >>> }
> >>>
> >>> +/* can_vector_compare_p presents fake rtx binary operations to the the back-end
> >>> +   in order to determine its capabilities.  In order to avoid creating fake
> >>> +   operations on each call, values from previous calls are cached in a global
> >>> +   cached_binops hash_table.  It contains rtxes, which can be looked up using
> >>> +   binop_keys.  */
> >>> +
> >>> +struct binop_key {
> >>> +  enum rtx_code code;        /* Operation code.  */
> >>> +  machine_mode value_mode;   /* Result mode.     */
> >>> +  machine_mode cmp_op_mode;  /* Operand mode.    */
> >>> +};
> >>> +
> >>> +struct binop_hasher : pointer_hash_mark<rtx>, ggc_cache_remove<rtx> {
> >>> +  typedef rtx value_type;
> >>> +  typedef binop_key compare_type;
> >>> +
> >>> +  static hashval_t
> >>> +  hash (enum rtx_code code, machine_mode value_mode, machine_mode cmp_op_mode)
> >>> +  {
> >>> +    inchash::hash hstate (0);
> >>> +    hstate.add_int (code);
> >>> +    hstate.add_int (value_mode);
> >>> +    hstate.add_int (cmp_op_mode);
> >>> +    return hstate.end ();
> >>> +  }
> >>> +
> >>> +  static hashval_t
> >>> +  hash (const rtx &ref)
> >>> +  {
> >>> +    return hash (GET_CODE (ref), GET_MODE (ref), GET_MODE (XEXP (ref, 0)));
> >>> +  }
> >>> +
> >>> +  static bool
> >>> +  equal (const rtx &ref1, const binop_key &ref2)
> >>> +  {
> >>> +    return (GET_CODE (ref1) == ref2.code)
> >>> +        && (GET_MODE (ref1) == ref2.value_mode)
> >>> +        && (GET_MODE (XEXP (ref1, 0)) == ref2.cmp_op_mode);
> >>> +  }
> >>> +};
> >>> +
> >>> +static GTY ((cache)) hash_table<binop_hasher> *cached_binops;
> >>> +
> >>> +static rtx
> >>> +get_cached_binop (enum rtx_code code, machine_mode value_mode,
> >>> +               machine_mode cmp_op_mode)
> >>> +{
> >>> +  if (!cached_binops)
> >>> +    cached_binops = hash_table<binop_hasher>::create_ggc (1024);
> >>> +  binop_key key = { code, value_mode, cmp_op_mode };
> >>> +  hashval_t hash = binop_hasher::hash (code, value_mode, cmp_op_mode);
> >>> +  rtx *slot = cached_binops->find_slot_with_hash (key, hash, INSERT);
> >>> +  if (!*slot)
> >>> +    *slot = gen_rtx_fmt_ee (code, value_mode, gen_reg_rtx (cmp_op_mode),
> >>> +                         gen_reg_rtx (cmp_op_mode));
> >>> +  return *slot;
> >>> +}
> >>
> >> Sorry, I didn't mean anything this complicated.  I just meant that
> >> we should have a single cached rtx that we can change via PUT_CODE and
> >> PUT_MODE_RAW for each new query, rather than allocating a new rtx each
> >> time.
> >>
> >> Something like:
> >>
> >> static GTY ((cache)) rtx cached_binop;
> >>
> >> rtx
> >> get_cached_binop (machine_mode mode, rtx_code code, machine_mode op_mode)
> >> {
> >>  if (cached_binop)
> >>    {
> >>      PUT_CODE (cached_binop, code);
> >>      PUT_MODE_RAW (cached_binop, mode);
> >>      PUT_MODE_RAW (XEXP (cached_binop, 0), op_mode);
> >>      PUT_MODE_RAW (XEXP (cached_binop, 1), op_mode);
> >>    }
> >>  else
> >>    {
> >>      rtx reg1 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 1);
> >>      rtx reg2 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 2);
> >>      cached_binop = gen_rtx_fmt_ee (code, mode, reg1, reg2);
> >>    }
> >>  return cached_binop;
> >> }
> >
> > Hmm, maybe we need  auto_rtx (code) that constructs such
> > RTX on the stack instead of wasting a GC root (and causing
> > issues for future threading of GCC ;)).
>
> Do you mean something like this?
>
> union {
>   char raw[rtx_code_size[code]];
>   rtx rtx;
> } binop;
>
> Does this exist already (git grep auto.*rtx / rtx.*auto doesn't show
> anything useful), or should I implement this?

It doesn't exist AFAIK, I thought about using alloca like

 rtx tem;
 rtx_alloca (tem, PLUS);

and due to using alloca rtx_alloca has to be a macro like

#define rtx_alloca(r, code) r = (rtx)alloca (RTX_CODE_SIZE(code));
memset (r, 0, RTX_HDR_SIZE); PUT_CODE (r, code);

maybe C++ can help making this prettier but of course
since we use alloca we have to avoid opening new scopes.

I guess templates like with auto_vec doesn't work unless
we can make RTX_CODE_SIZE constant-evaluated.

Richard.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/9] Introduce can_vector_compare_p function
  2019-08-26 10:04         ` Richard Biener
@ 2019-08-26 12:18           ` Ilya Leoshkevich
  2019-08-26 13:45             ` Richard Biener
  0 siblings, 1 reply; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-08-26 12:18 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Segher Boessenkool, Richard Sandiford

> Am 26.08.2019 um 10:49 schrieb Richard Biener <richard.guenther@gmail.com>:
> 
> On Fri, Aug 23, 2019 at 1:35 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>> 
>>> Am 23.08.2019 um 13:24 schrieb Richard Biener <richard.guenther@gmail.com>:
>>> 
>>> On Fri, Aug 23, 2019 at 12:43 PM Richard Sandiford
>>> <richard.sandiford@arm.com> wrote:
>>>> 
>>>> Ilya Leoshkevich <iii@linux.ibm.com> writes:
>>>>> @@ -3819,6 +3820,82 @@ can_compare_p (enum rtx_code code, machine_mode mode,
>>>>>  return 0;
>>>>> }
>>>>> 
>>>>> +/* can_vector_compare_p presents fake rtx binary operations to the the back-end
>>>>> +   in order to determine its capabilities.  In order to avoid creating fake
>>>>> +   operations on each call, values from previous calls are cached in a global
>>>>> +   cached_binops hash_table.  It contains rtxes, which can be looked up using
>>>>> +   binop_keys.  */
>>>>> +
>>>>> +struct binop_key {
>>>>> +  enum rtx_code code;        /* Operation code.  */
>>>>> +  machine_mode value_mode;   /* Result mode.     */
>>>>> +  machine_mode cmp_op_mode;  /* Operand mode.    */
>>>>> +};
>>>>> +
>>>>> +struct binop_hasher : pointer_hash_mark<rtx>, ggc_cache_remove<rtx> {
>>>>> +  typedef rtx value_type;
>>>>> +  typedef binop_key compare_type;
>>>>> +
>>>>> +  static hashval_t
>>>>> +  hash (enum rtx_code code, machine_mode value_mode, machine_mode cmp_op_mode)
>>>>> +  {
>>>>> +    inchash::hash hstate (0);
>>>>> +    hstate.add_int (code);
>>>>> +    hstate.add_int (value_mode);
>>>>> +    hstate.add_int (cmp_op_mode);
>>>>> +    return hstate.end ();
>>>>> +  }
>>>>> +
>>>>> +  static hashval_t
>>>>> +  hash (const rtx &ref)
>>>>> +  {
>>>>> +    return hash (GET_CODE (ref), GET_MODE (ref), GET_MODE (XEXP (ref, 0)));
>>>>> +  }
>>>>> +
>>>>> +  static bool
>>>>> +  equal (const rtx &ref1, const binop_key &ref2)
>>>>> +  {
>>>>> +    return (GET_CODE (ref1) == ref2.code)
>>>>> +        && (GET_MODE (ref1) == ref2.value_mode)
>>>>> +        && (GET_MODE (XEXP (ref1, 0)) == ref2.cmp_op_mode);
>>>>> +  }
>>>>> +};
>>>>> +
>>>>> +static GTY ((cache)) hash_table<binop_hasher> *cached_binops;
>>>>> +
>>>>> +static rtx
>>>>> +get_cached_binop (enum rtx_code code, machine_mode value_mode,
>>>>> +               machine_mode cmp_op_mode)
>>>>> +{
>>>>> +  if (!cached_binops)
>>>>> +    cached_binops = hash_table<binop_hasher>::create_ggc (1024);
>>>>> +  binop_key key = { code, value_mode, cmp_op_mode };
>>>>> +  hashval_t hash = binop_hasher::hash (code, value_mode, cmp_op_mode);
>>>>> +  rtx *slot = cached_binops->find_slot_with_hash (key, hash, INSERT);
>>>>> +  if (!*slot)
>>>>> +    *slot = gen_rtx_fmt_ee (code, value_mode, gen_reg_rtx (cmp_op_mode),
>>>>> +                         gen_reg_rtx (cmp_op_mode));
>>>>> +  return *slot;
>>>>> +}
>>>> 
>>>> Sorry, I didn't mean anything this complicated.  I just meant that
>>>> we should have a single cached rtx that we can change via PUT_CODE and
>>>> PUT_MODE_RAW for each new query, rather than allocating a new rtx each
>>>> time.
>>>> 
>>>> Something like:
>>>> 
>>>> static GTY ((cache)) rtx cached_binop;
>>>> 
>>>> rtx
>>>> get_cached_binop (machine_mode mode, rtx_code code, machine_mode op_mode)
>>>> {
>>>> if (cached_binop)
>>>>   {
>>>>     PUT_CODE (cached_binop, code);
>>>>     PUT_MODE_RAW (cached_binop, mode);
>>>>     PUT_MODE_RAW (XEXP (cached_binop, 0), op_mode);
>>>>     PUT_MODE_RAW (XEXP (cached_binop, 1), op_mode);
>>>>   }
>>>> else
>>>>   {
>>>>     rtx reg1 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 1);
>>>>     rtx reg2 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 2);
>>>>     cached_binop = gen_rtx_fmt_ee (code, mode, reg1, reg2);
>>>>   }
>>>> return cached_binop;
>>>> }
>>> 
>>> Hmm, maybe we need  auto_rtx (code) that constructs such
>>> RTX on the stack instead of wasting a GC root (and causing
>>> issues for future threading of GCC ;)).
>> 
>> Do you mean something like this?
>> 
>> union {
>>  char raw[rtx_code_size[code]];
>>  rtx rtx;
>> } binop;
>> 
>> Does this exist already (git grep auto.*rtx / rtx.*auto doesn't show
>> anything useful), or should I implement this?
> 
> It doesn't exist AFAIK, I thought about using alloca like
> 
> rtx tem;
> rtx_alloca (tem, PLUS);
> 
> and due to using alloca rtx_alloca has to be a macro like
> 
> #define rtx_alloca(r, code) r = (rtx)alloca (RTX_CODE_SIZE(code));
> memset (r, 0, RTX_HDR_SIZE); PUT_CODE (r, code);
> 
> maybe C++ can help making this prettier but of course
> since we use alloca we have to avoid opening new scopes.
> 
> I guess templates like with auto_vec doesn't work unless
> we can make RTX_CODE_SIZE constant-evaluated.
> 
> Richard.

I ended up with the following change:

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index a667cdab94e..97aa2144e95 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -466,17 +466,25 @@ set_mode_and_regno (rtx x, machine_mode mode, unsigned int regno)
   set_regno_raw (x, regno, nregs);
 }
 
-/* Generate a new REG rtx.  Make sure ORIGINAL_REGNO is set properly, and
+/* Initialize a REG rtx.  Make sure ORIGINAL_REGNO is set properly, and
    don't attempt to share with the various global pieces of rtl (such as
    frame_pointer_rtx).  */
 
-rtx
-gen_raw_REG (machine_mode mode, unsigned int regno)
+void
+init_raw_REG (rtx x, machine_mode mode, unsigned int regno)
 {
-  rtx x = rtx_alloc (REG MEM_STAT_INFO);
   set_mode_and_regno (x, mode, regno);
   REG_ATTRS (x) = NULL;
   ORIGINAL_REGNO (x) = regno;
+}
+
+/* Generate a new REG rtx.  */
+
+rtx
+gen_raw_REG (machine_mode mode, unsigned int regno)
+{
+  rtx x = rtx_alloc (REG MEM_STAT_INFO);
+  init_raw_REG (x, mode, regno);
   return x;
 }
 
diff --git a/gcc/gengenrtl.c b/gcc/gengenrtl.c
index 5c78fabfb50..bb2087da258 100644
--- a/gcc/gengenrtl.c
+++ b/gcc/gengenrtl.c
@@ -231,8 +231,7 @@ genmacro (int idx)
   puts (")");
 }
 
-/* Generate the code for the function to generate RTL whose
-   format is FORMAT.  */
+/* Generate the code for functions to generate RTL whose format is FORMAT.  */
 
 static void
 gendef (const char *format)
@@ -240,22 +239,18 @@ gendef (const char *format)
   const char *p;
   int i, j;
 
-  /* Start by writing the definition of the function name and the types
+  /* Write the definition of the init function name and the types
      of the arguments.  */
 
-  printf ("static inline rtx\ngen_rtx_fmt_%s_stat (RTX_CODE code, machine_mode mode", format);
+  puts ("static inline void");
+  printf ("init_rtx_fmt_%s (rtx rt, machine_mode mode", format);
   for (p = format, i = 0; *p != 0; p++)
     if (*p != '0')
       printf (",\n\t%sarg%d", type_from_format (*p), i++);
+  puts (")");
 
-  puts (" MEM_STAT_DECL)");
-
-  /* Now write out the body of the function itself, which allocates
-     the memory and initializes it.  */
+  /* Now write out the body of the init function itself.  */
   puts ("{");
-  puts ("  rtx rt;");
-  puts ("  rt = rtx_alloc (code PASS_MEM_STAT);\n");
-
   puts ("  PUT_MODE_RAW (rt, mode);");
 
   for (p = format, i = j = 0; *p ; ++p, ++i)
@@ -266,16 +261,56 @@ gendef (const char *format)
     else
       printf ("  %s (rt, %d) = arg%d;\n", accessor_from_format (*p), i, j++);
 
-  puts ("\n  return rt;\n}\n");
+  puts ("}\n");
+
+  /* Write the definition of the gen function name and the types
+     of the arguments.  */
+
+  puts ("static inline rtx");
+  printf ("gen_rtx_fmt_%s_stat (RTX_CODE code, machine_mode mode", format);
+  for (p = format, i = 0; *p != 0; p++)
+    if (*p != '0')
+      printf (",\n\t%sarg%d", type_from_format (*p), i++);
+  puts (" MEM_STAT_DECL)");
+
+  /* Now write out the body of the function itself, which allocates
+     the memory and initializes it.  */
+  puts ("{");
+  puts ("  rtx rt;\n");
+
+  puts ("  rt = rtx_alloc (code PASS_MEM_STAT);");
+  printf ("  init_rtx_fmt_%s (rt, mode", format);
+  for (p = format, i = 0; *p != 0; p++)
+    if (*p != '0')
+      printf (", arg%d", i++);
+  puts (");\n");
+
+  puts ("  return rt;\n}\n");
+
+  /* Write the definition of gen macro.  */
+
   printf ("#define gen_rtx_fmt_%s(c, m", format);
   for (p = format, i = 0; *p != 0; p++)
     if (*p != '0')
-      printf (", p%i",i++);
-  printf (")\\\n        gen_rtx_fmt_%s_stat (c, m", format);
+      printf (", arg%d", i++);
+  printf (") \\\n  gen_rtx_fmt_%s_stat ((c), (m)", format);
   for (p = format, i = 0; *p != 0; p++)
     if (*p != '0')
-      printf (", p%i",i++);
+      printf (", (arg%d)", i++);
   printf (" MEM_STAT_INFO)\n\n");
+
+  /* Write the definition of alloca macro.  */
+
+  printf ("#define alloca_rtx_fmt_%s(rt, c, m", format);
+  for (p = format, i = 0; *p != 0; p++)
+    if (*p != '0')
+      printf (", arg%d", i++);
+  printf (") \\\n  rtx_alloca ((rt), (c)); \\\n");
+  printf ("  init_rtx_fmt_%s ((rt), (m)", format);
+  for (p = format, i = 0; *p != 0; p++)
+    if (*p != '0')
+      printf (", (arg%d)", i++);
+  printf (")\n\n");
 }
 
 /* Generate the documentation header for files we write.  */
diff --git a/gcc/rtl.h b/gcc/rtl.h
index efb9b3ce40d..44733d8a39e 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2933,6 +2933,10 @@ extern HOST_WIDE_INT get_stack_check_protect (void);
 
 /* In rtl.c */
 extern rtx rtx_alloc (RTX_CODE CXX_MEM_STAT_INFO);
+#define rtx_alloca(rt, code) \
+  (rt) = (rtx) alloca (RTX_CODE_SIZE ((code))); \
+  memset ((rt), 0, RTX_HDR_SIZE); \
+  PUT_CODE ((rt), (code));
 extern rtx rtx_alloc_stat_v (RTX_CODE MEM_STAT_DECL, int);
 #define rtx_alloc_v(c, SZ) rtx_alloc_stat_v (c MEM_STAT_INFO, SZ)
 #define const_wide_int_alloc(NWORDS)				\
@@ -3797,7 +3801,11 @@ gen_rtx_INSN (machine_mode mode, rtx_insn *prev_insn, rtx_insn *next_insn,
 extern rtx gen_rtx_CONST_INT (machine_mode, HOST_WIDE_INT);
 extern rtx gen_rtx_CONST_VECTOR (machine_mode, rtvec);
 extern void set_mode_and_regno (rtx, machine_mode, unsigned int);
+extern void init_raw_REG (rtx, machine_mode, unsigned int);
 extern rtx gen_raw_REG (machine_mode, unsigned int);
+#define alloca_raw_REG(rt, mode, regno) \
+  rtx_alloca ((rt), REG); \
+  init_raw_REG ((rt), (mode), (regno))
 extern rtx gen_rtx_REG (machine_mode, unsigned int);
 extern rtx gen_rtx_SUBREG (machine_mode, rtx, poly_uint64);
 extern rtx gen_rtx_MEM (machine_mode, rtx);

which now allows me to write:

rtx reg1, reg2, test;
alloca_raw_REG (reg1, cmp_op_mode, LAST_VIRTUAL_REGISTER + 1);
alloca_raw_REG (reg2, cmp_op_mode, LAST_VIRTUAL_REGISTER + 2);
alloca_rtx_fmt_ee (test, code, value_mode, reg1, reg2);

If that looks ok, I'll resend the series.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/9] Introduce can_vector_compare_p function
  2019-08-26 12:18           ` Ilya Leoshkevich
@ 2019-08-26 13:45             ` Richard Biener
  2019-08-26 13:46               ` Ilya Leoshkevich
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Biener @ 2019-08-26 13:45 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: GCC Patches, Segher Boessenkool, Richard Sandiford

On Mon, Aug 26, 2019 at 1:54 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> > Am 26.08.2019 um 10:49 schrieb Richard Biener <richard.guenther@gmail.com>:
> >
> > On Fri, Aug 23, 2019 at 1:35 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >>
> >>> Am 23.08.2019 um 13:24 schrieb Richard Biener <richard.guenther@gmail.com>:
> >>>
> >>> On Fri, Aug 23, 2019 at 12:43 PM Richard Sandiford
> >>> <richard.sandiford@arm.com> wrote:
> >>>>
> >>>> Ilya Leoshkevich <iii@linux.ibm.com> writes:
> >>>>> @@ -3819,6 +3820,82 @@ can_compare_p (enum rtx_code code, machine_mode mode,
> >>>>>  return 0;
> >>>>> }
> >>>>>
> >>>>> +/* can_vector_compare_p presents fake rtx binary operations to the the back-end
> >>>>> +   in order to determine its capabilities.  In order to avoid creating fake
> >>>>> +   operations on each call, values from previous calls are cached in a global
> >>>>> +   cached_binops hash_table.  It contains rtxes, which can be looked up using
> >>>>> +   binop_keys.  */
> >>>>> +
> >>>>> +struct binop_key {
> >>>>> +  enum rtx_code code;        /* Operation code.  */
> >>>>> +  machine_mode value_mode;   /* Result mode.     */
> >>>>> +  machine_mode cmp_op_mode;  /* Operand mode.    */
> >>>>> +};
> >>>>> +
> >>>>> +struct binop_hasher : pointer_hash_mark<rtx>, ggc_cache_remove<rtx> {
> >>>>> +  typedef rtx value_type;
> >>>>> +  typedef binop_key compare_type;
> >>>>> +
> >>>>> +  static hashval_t
> >>>>> +  hash (enum rtx_code code, machine_mode value_mode, machine_mode cmp_op_mode)
> >>>>> +  {
> >>>>> +    inchash::hash hstate (0);
> >>>>> +    hstate.add_int (code);
> >>>>> +    hstate.add_int (value_mode);
> >>>>> +    hstate.add_int (cmp_op_mode);
> >>>>> +    return hstate.end ();
> >>>>> +  }
> >>>>> +
> >>>>> +  static hashval_t
> >>>>> +  hash (const rtx &ref)
> >>>>> +  {
> >>>>> +    return hash (GET_CODE (ref), GET_MODE (ref), GET_MODE (XEXP (ref, 0)));
> >>>>> +  }
> >>>>> +
> >>>>> +  static bool
> >>>>> +  equal (const rtx &ref1, const binop_key &ref2)
> >>>>> +  {
> >>>>> +    return (GET_CODE (ref1) == ref2.code)
> >>>>> +        && (GET_MODE (ref1) == ref2.value_mode)
> >>>>> +        && (GET_MODE (XEXP (ref1, 0)) == ref2.cmp_op_mode);
> >>>>> +  }
> >>>>> +};
> >>>>> +
> >>>>> +static GTY ((cache)) hash_table<binop_hasher> *cached_binops;
> >>>>> +
> >>>>> +static rtx
> >>>>> +get_cached_binop (enum rtx_code code, machine_mode value_mode,
> >>>>> +               machine_mode cmp_op_mode)
> >>>>> +{
> >>>>> +  if (!cached_binops)
> >>>>> +    cached_binops = hash_table<binop_hasher>::create_ggc (1024);
> >>>>> +  binop_key key = { code, value_mode, cmp_op_mode };
> >>>>> +  hashval_t hash = binop_hasher::hash (code, value_mode, cmp_op_mode);
> >>>>> +  rtx *slot = cached_binops->find_slot_with_hash (key, hash, INSERT);
> >>>>> +  if (!*slot)
> >>>>> +    *slot = gen_rtx_fmt_ee (code, value_mode, gen_reg_rtx (cmp_op_mode),
> >>>>> +                         gen_reg_rtx (cmp_op_mode));
> >>>>> +  return *slot;
> >>>>> +}
> >>>>
> >>>> Sorry, I didn't mean anything this complicated.  I just meant that
> >>>> we should have a single cached rtx that we can change via PUT_CODE and
> >>>> PUT_MODE_RAW for each new query, rather than allocating a new rtx each
> >>>> time.
> >>>>
> >>>> Something like:
> >>>>
> >>>> static GTY ((cache)) rtx cached_binop;
> >>>>
> >>>> rtx
> >>>> get_cached_binop (machine_mode mode, rtx_code code, machine_mode op_mode)
> >>>> {
> >>>> if (cached_binop)
> >>>>   {
> >>>>     PUT_CODE (cached_binop, code);
> >>>>     PUT_MODE_RAW (cached_binop, mode);
> >>>>     PUT_MODE_RAW (XEXP (cached_binop, 0), op_mode);
> >>>>     PUT_MODE_RAW (XEXP (cached_binop, 1), op_mode);
> >>>>   }
> >>>> else
> >>>>   {
> >>>>     rtx reg1 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 1);
> >>>>     rtx reg2 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 2);
> >>>>     cached_binop = gen_rtx_fmt_ee (code, mode, reg1, reg2);
> >>>>   }
> >>>> return cached_binop;
> >>>> }
> >>>
> >>> Hmm, maybe we need  auto_rtx (code) that constructs such
> >>> RTX on the stack instead of wasting a GC root (and causing
> >>> issues for future threading of GCC ;)).
> >>
> >> Do you mean something like this?
> >>
> >> union {
> >>  char raw[rtx_code_size[code]];
> >>  rtx rtx;
> >> } binop;
> >>
> >> Does this exist already (git grep auto.*rtx / rtx.*auto doesn't show
> >> anything useful), or should I implement this?
> >
> > It doesn't exist AFAIK, I thought about using alloca like
> >
> > rtx tem;
> > rtx_alloca (tem, PLUS);
> >
> > and due to using alloca rtx_alloca has to be a macro like
> >
> > #define rtx_alloca(r, code) r = (rtx)alloca (RTX_CODE_SIZE(code));
> > memset (r, 0, RTX_HDR_SIZE); PUT_CODE (r, code);
> >
> > maybe C++ can help making this prettier but of course
> > since we use alloca we have to avoid opening new scopes.
> >
> > I guess templates like with auto_vec doesn't work unless
> > we can make RTX_CODE_SIZE constant-evaluated.
> >
> > Richard.
>
> I ended up with the following change:
>
> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> index a667cdab94e..97aa2144e95 100644
> --- a/gcc/emit-rtl.c
> +++ b/gcc/emit-rtl.c
> @@ -466,17 +466,25 @@ set_mode_and_regno (rtx x, machine_mode mode, unsigned int regno)
>    set_regno_raw (x, regno, nregs);
>  }
>
> -/* Generate a new REG rtx.  Make sure ORIGINAL_REGNO is set properly, and
> +/* Initialize a REG rtx.  Make sure ORIGINAL_REGNO is set properly, and
>     don't attempt to share with the various global pieces of rtl (such as
>     frame_pointer_rtx).  */
>
> -rtx
> -gen_raw_REG (machine_mode mode, unsigned int regno)
> +void
> +init_raw_REG (rtx x, machine_mode mode, unsigned int regno)
>  {
> -  rtx x = rtx_alloc (REG MEM_STAT_INFO);
>    set_mode_and_regno (x, mode, regno);
>    REG_ATTRS (x) = NULL;
>    ORIGINAL_REGNO (x) = regno;
> +}
> +
> +/* Generate a new REG rtx.  */
> +
> +rtx
> +gen_raw_REG (machine_mode mode, unsigned int regno)
> +{
> +  rtx x = rtx_alloc (REG MEM_STAT_INFO);
> +  init_raw_REG (x, mode, regno);
>    return x;
>  }
>
> diff --git a/gcc/gengenrtl.c b/gcc/gengenrtl.c
> index 5c78fabfb50..bb2087da258 100644
> --- a/gcc/gengenrtl.c
> +++ b/gcc/gengenrtl.c
> @@ -231,8 +231,7 @@ genmacro (int idx)
>    puts (")");
>  }
>
> -/* Generate the code for the function to generate RTL whose
> -   format is FORMAT.  */
> +/* Generate the code for functions to generate RTL whose format is FORMAT.  */
>
>  static void
>  gendef (const char *format)
> @@ -240,22 +239,18 @@ gendef (const char *format)
>    const char *p;
>    int i, j;
>
> -  /* Start by writing the definition of the function name and the types
> +  /* Write the definition of the init function name and the types
>       of the arguments.  */
>
> -  printf ("static inline rtx\ngen_rtx_fmt_%s_stat (RTX_CODE code, machine_mode mode", format);
> +  puts ("static inline void");
> +  printf ("init_rtx_fmt_%s (rtx rt, machine_mode mode", format);
>    for (p = format, i = 0; *p != 0; p++)
>      if (*p != '0')
>        printf (",\n\t%sarg%d", type_from_format (*p), i++);
> +  puts (")");
>
> -  puts (" MEM_STAT_DECL)");
> -
> -  /* Now write out the body of the function itself, which allocates
> -     the memory and initializes it.  */
> +  /* Now write out the body of the init function itself.  */
>    puts ("{");
> -  puts ("  rtx rt;");
> -  puts ("  rt = rtx_alloc (code PASS_MEM_STAT);\n");
> -
>    puts ("  PUT_MODE_RAW (rt, mode);");
>
>    for (p = format, i = j = 0; *p ; ++p, ++i)
> @@ -266,16 +261,56 @@ gendef (const char *format)
>      else
>        printf ("  %s (rt, %d) = arg%d;\n", accessor_from_format (*p), i, j++);
>
> -  puts ("\n  return rt;\n}\n");
> +  puts ("}\n");
> +
> +  /* Write the definition of the gen function name and the types
> +     of the arguments.  */
> +
> +  puts ("static inline rtx");
> +  printf ("gen_rtx_fmt_%s_stat (RTX_CODE code, machine_mode mode", format);
> +  for (p = format, i = 0; *p != 0; p++)
> +    if (*p != '0')
> +      printf (",\n\t%sarg%d", type_from_format (*p), i++);
> +  puts (" MEM_STAT_DECL)");
> +
> +  /* Now write out the body of the function itself, which allocates
> +     the memory and initializes it.  */
> +  puts ("{");
> +  puts ("  rtx rt;\n");
> +
> +  puts ("  rt = rtx_alloc (code PASS_MEM_STAT);");
> +  printf ("  init_rtx_fmt_%s (rt, mode", format);
> +  for (p = format, i = 0; *p != 0; p++)
> +    if (*p != '0')
> +      printf (", arg%d", i++);
> +  puts (");\n");
> +
> +  puts ("  return rt;\n}\n");
> +
> +  /* Write the definition of gen macro.  */
> +
>    printf ("#define gen_rtx_fmt_%s(c, m", format);
>    for (p = format, i = 0; *p != 0; p++)
>      if (*p != '0')
> -      printf (", p%i",i++);
> -  printf (")\\\n        gen_rtx_fmt_%s_stat (c, m", format);
> +      printf (", arg%d", i++);
> +  printf (") \\\n  gen_rtx_fmt_%s_stat ((c), (m)", format);
>    for (p = format, i = 0; *p != 0; p++)
>      if (*p != '0')
> -      printf (", p%i",i++);
> +      printf (", (arg%d)", i++);
>    printf (" MEM_STAT_INFO)\n\n");
> +
> +  /* Write the definition of alloca macro.  */
> +
> +  printf ("#define alloca_rtx_fmt_%s(rt, c, m", format);
> +  for (p = format, i = 0; *p != 0; p++)
> +    if (*p != '0')
> +      printf (", arg%d", i++);
> +  printf (") \\\n  rtx_alloca ((rt), (c)); \\\n");
> +  printf ("  init_rtx_fmt_%s ((rt), (m)", format);
> +  for (p = format, i = 0; *p != 0; p++)
> +    if (*p != '0')
> +      printf (", (arg%d)", i++);
> +  printf (")\n\n");
>  }
>
>  /* Generate the documentation header for files we write.  */
> diff --git a/gcc/rtl.h b/gcc/rtl.h
> index efb9b3ce40d..44733d8a39e 100644
> --- a/gcc/rtl.h
> +++ b/gcc/rtl.h
> @@ -2933,6 +2933,10 @@ extern HOST_WIDE_INT get_stack_check_protect (void);
>
>  /* In rtl.c */
>  extern rtx rtx_alloc (RTX_CODE CXX_MEM_STAT_INFO);
> +#define rtx_alloca(rt, code) \
> +  (rt) = (rtx) alloca (RTX_CODE_SIZE ((code))); \
> +  memset ((rt), 0, RTX_HDR_SIZE); \
> +  PUT_CODE ((rt), (code));
>  extern rtx rtx_alloc_stat_v (RTX_CODE MEM_STAT_DECL, int);
>  #define rtx_alloc_v(c, SZ) rtx_alloc_stat_v (c MEM_STAT_INFO, SZ)
>  #define const_wide_int_alloc(NWORDS)                           \
> @@ -3797,7 +3801,11 @@ gen_rtx_INSN (machine_mode mode, rtx_insn *prev_insn, rtx_insn *next_insn,
>  extern rtx gen_rtx_CONST_INT (machine_mode, HOST_WIDE_INT);
>  extern rtx gen_rtx_CONST_VECTOR (machine_mode, rtvec);
>  extern void set_mode_and_regno (rtx, machine_mode, unsigned int);
> +extern void init_raw_REG (rtx, machine_mode, unsigned int);
>  extern rtx gen_raw_REG (machine_mode, unsigned int);
> +#define alloca_raw_REG(rt, mode, regno) \
> +  rtx_alloca ((rt), REG); \
> +  init_raw_REG ((rt), (mode), (regno))
>  extern rtx gen_rtx_REG (machine_mode, unsigned int);
>  extern rtx gen_rtx_SUBREG (machine_mode, rtx, poly_uint64);
>  extern rtx gen_rtx_MEM (machine_mode, rtx);
>
> which now allows me to write:
>
> rtx reg1, reg2, test;
> alloca_raw_REG (reg1, cmp_op_mode, LAST_VIRTUAL_REGISTER + 1);
> alloca_raw_REG (reg2, cmp_op_mode, LAST_VIRTUAL_REGISTER + 2);
> alloca_rtx_fmt_ee (test, code, value_mode, reg1, reg2);
>
> If that looks ok, I'll resend the series.

that looks OK to me - please leave Richard S. time to comment.  Also while
I'd like to see

rtx reg1 = alloca_raw_REG (cmp_op_mode, LAST_VIRTUAL_REGISTER + 1);

I don't really see a way to write that portably (or at all), do you all agree?
GCC doesn't seem to convert alloca() calls to __builtin_stack_save/restore
nor place CLOBBERs to end their lifetime.  But is it guaranteed that the
alloca result is valid until frame termination?

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/9] Introduce can_vector_compare_p function
  2019-08-26 13:45             ` Richard Biener
@ 2019-08-26 13:46               ` Ilya Leoshkevich
  2019-08-26 17:15                 ` Ilya Leoshkevich
  0 siblings, 1 reply; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-08-26 13:46 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Segher Boessenkool, Richard Sandiford

> Am 26.08.2019 um 15:06 schrieb Richard Biener <richard.guenther@gmail.com>:
> 
> On Mon, Aug 26, 2019 at 1:54 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>> 
>>> Am 26.08.2019 um 10:49 schrieb Richard Biener <richard.guenther@gmail.com>:
>>> 
>>> On Fri, Aug 23, 2019 at 1:35 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>>>> 
>>>>> Am 23.08.2019 um 13:24 schrieb Richard Biener <richard.guenther@gmail.com>:
>>>>> 
>>>>> On Fri, Aug 23, 2019 at 12:43 PM Richard Sandiford
>>>>> <richard.sandiford@arm.com> wrote:
>>>>>> 
>>>>>> Ilya Leoshkevich <iii@linux.ibm.com> writes:
>>>>>>> @@ -3819,6 +3820,82 @@ can_compare_p (enum rtx_code code, machine_mode mode,
>>>>>>> return 0;
>>>>>>> }
>>>>>>> 
>>>>>>> +/* can_vector_compare_p presents fake rtx binary operations to the the back-end
>>>>>>> +   in order to determine its capabilities.  In order to avoid creating fake
>>>>>>> +   operations on each call, values from previous calls are cached in a global
>>>>>>> +   cached_binops hash_table.  It contains rtxes, which can be looked up using
>>>>>>> +   binop_keys.  */
>>>>>>> +
>>>>>>> +struct binop_key {
>>>>>>> +  enum rtx_code code;        /* Operation code.  */
>>>>>>> +  machine_mode value_mode;   /* Result mode.     */
>>>>>>> +  machine_mode cmp_op_mode;  /* Operand mode.    */
>>>>>>> +};
>>>>>>> +
>>>>>>> +struct binop_hasher : pointer_hash_mark<rtx>, ggc_cache_remove<rtx> {
>>>>>>> +  typedef rtx value_type;
>>>>>>> +  typedef binop_key compare_type;
>>>>>>> +
>>>>>>> +  static hashval_t
>>>>>>> +  hash (enum rtx_code code, machine_mode value_mode, machine_mode cmp_op_mode)
>>>>>>> +  {
>>>>>>> +    inchash::hash hstate (0);
>>>>>>> +    hstate.add_int (code);
>>>>>>> +    hstate.add_int (value_mode);
>>>>>>> +    hstate.add_int (cmp_op_mode);
>>>>>>> +    return hstate.end ();
>>>>>>> +  }
>>>>>>> +
>>>>>>> +  static hashval_t
>>>>>>> +  hash (const rtx &ref)
>>>>>>> +  {
>>>>>>> +    return hash (GET_CODE (ref), GET_MODE (ref), GET_MODE (XEXP (ref, 0)));
>>>>>>> +  }
>>>>>>> +
>>>>>>> +  static bool
>>>>>>> +  equal (const rtx &ref1, const binop_key &ref2)
>>>>>>> +  {
>>>>>>> +    return (GET_CODE (ref1) == ref2.code)
>>>>>>> +        && (GET_MODE (ref1) == ref2.value_mode)
>>>>>>> +        && (GET_MODE (XEXP (ref1, 0)) == ref2.cmp_op_mode);
>>>>>>> +  }
>>>>>>> +};
>>>>>>> +
>>>>>>> +static GTY ((cache)) hash_table<binop_hasher> *cached_binops;
>>>>>>> +
>>>>>>> +static rtx
>>>>>>> +get_cached_binop (enum rtx_code code, machine_mode value_mode,
>>>>>>> +               machine_mode cmp_op_mode)
>>>>>>> +{
>>>>>>> +  if (!cached_binops)
>>>>>>> +    cached_binops = hash_table<binop_hasher>::create_ggc (1024);
>>>>>>> +  binop_key key = { code, value_mode, cmp_op_mode };
>>>>>>> +  hashval_t hash = binop_hasher::hash (code, value_mode, cmp_op_mode);
>>>>>>> +  rtx *slot = cached_binops->find_slot_with_hash (key, hash, INSERT);
>>>>>>> +  if (!*slot)
>>>>>>> +    *slot = gen_rtx_fmt_ee (code, value_mode, gen_reg_rtx (cmp_op_mode),
>>>>>>> +                         gen_reg_rtx (cmp_op_mode));
>>>>>>> +  return *slot;
>>>>>>> +}
>>>>>> 
>>>>>> Sorry, I didn't mean anything this complicated.  I just meant that
>>>>>> we should have a single cached rtx that we can change via PUT_CODE and
>>>>>> PUT_MODE_RAW for each new query, rather than allocating a new rtx each
>>>>>> time.
>>>>>> 
>>>>>> Something like:
>>>>>> 
>>>>>> static GTY ((cache)) rtx cached_binop;
>>>>>> 
>>>>>> rtx
>>>>>> get_cached_binop (machine_mode mode, rtx_code code, machine_mode op_mode)
>>>>>> {
>>>>>> if (cached_binop)
>>>>>>  {
>>>>>>    PUT_CODE (cached_binop, code);
>>>>>>    PUT_MODE_RAW (cached_binop, mode);
>>>>>>    PUT_MODE_RAW (XEXP (cached_binop, 0), op_mode);
>>>>>>    PUT_MODE_RAW (XEXP (cached_binop, 1), op_mode);
>>>>>>  }
>>>>>> else
>>>>>>  {
>>>>>>    rtx reg1 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 1);
>>>>>>    rtx reg2 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 2);
>>>>>>    cached_binop = gen_rtx_fmt_ee (code, mode, reg1, reg2);
>>>>>>  }
>>>>>> return cached_binop;
>>>>>> }
>>>>> 
>>>>> Hmm, maybe we need  auto_rtx (code) that constructs such
>>>>> RTX on the stack instead of wasting a GC root (and causing
>>>>> issues for future threading of GCC ;)).
>>>> 
>>>> Do you mean something like this?
>>>> 
>>>> union {
>>>> char raw[rtx_code_size[code]];
>>>> rtx rtx;
>>>> } binop;
>>>> 
>>>> Does this exist already (git grep auto.*rtx / rtx.*auto doesn't show
>>>> anything useful), or should I implement this?
>>> 
>>> It doesn't exist AFAIK, I thought about using alloca like
>>> 
>>> rtx tem;
>>> rtx_alloca (tem, PLUS);
>>> 
>>> and due to using alloca rtx_alloca has to be a macro like
>>> 
>>> #define rtx_alloca(r, code) r = (rtx)alloca (RTX_CODE_SIZE(code));
>>> memset (r, 0, RTX_HDR_SIZE); PUT_CODE (r, code);
>>> 
>>> maybe C++ can help making this prettier but of course
>>> since we use alloca we have to avoid opening new scopes.
>>> 
>>> I guess templates like with auto_vec doesn't work unless
>>> we can make RTX_CODE_SIZE constant-evaluated.
>>> 
>>> Richard.
>> 
>> I ended up with the following change:
>> 
>> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
>> index a667cdab94e..97aa2144e95 100644
>> --- a/gcc/emit-rtl.c
>> +++ b/gcc/emit-rtl.c
>> @@ -466,17 +466,25 @@ set_mode_and_regno (rtx x, machine_mode mode, unsigned int regno)
>>   set_regno_raw (x, regno, nregs);
>> }
>> 
>> -/* Generate a new REG rtx.  Make sure ORIGINAL_REGNO is set properly, and
>> +/* Initialize a REG rtx.  Make sure ORIGINAL_REGNO is set properly, and
>>    don't attempt to share with the various global pieces of rtl (such as
>>    frame_pointer_rtx).  */
>> 
>> -rtx
>> -gen_raw_REG (machine_mode mode, unsigned int regno)
>> +void
>> +init_raw_REG (rtx x, machine_mode mode, unsigned int regno)
>> {
>> -  rtx x = rtx_alloc (REG MEM_STAT_INFO);
>>   set_mode_and_regno (x, mode, regno);
>>   REG_ATTRS (x) = NULL;
>>   ORIGINAL_REGNO (x) = regno;
>> +}
>> +
>> +/* Generate a new REG rtx.  */
>> +
>> +rtx
>> +gen_raw_REG (machine_mode mode, unsigned int regno)
>> +{
>> +  rtx x = rtx_alloc (REG MEM_STAT_INFO);
>> +  init_raw_REG (x, mode, regno);
>>   return x;
>> }
>> 
>> diff --git a/gcc/gengenrtl.c b/gcc/gengenrtl.c
>> index 5c78fabfb50..bb2087da258 100644
>> --- a/gcc/gengenrtl.c
>> +++ b/gcc/gengenrtl.c
>> @@ -231,8 +231,7 @@ genmacro (int idx)
>>   puts (")");
>> }
>> 
>> -/* Generate the code for the function to generate RTL whose
>> -   format is FORMAT.  */
>> +/* Generate the code for functions to generate RTL whose format is FORMAT.  */
>> 
>> static void
>> gendef (const char *format)
>> @@ -240,22 +239,18 @@ gendef (const char *format)
>>   const char *p;
>>   int i, j;
>> 
>> -  /* Start by writing the definition of the function name and the types
>> +  /* Write the definition of the init function name and the types
>>      of the arguments.  */
>> 
>> -  printf ("static inline rtx\ngen_rtx_fmt_%s_stat (RTX_CODE code, machine_mode mode", format);
>> +  puts ("static inline void");
>> +  printf ("init_rtx_fmt_%s (rtx rt, machine_mode mode", format);
>>   for (p = format, i = 0; *p != 0; p++)
>>     if (*p != '0')
>>       printf (",\n\t%sarg%d", type_from_format (*p), i++);
>> +  puts (")");
>> 
>> -  puts (" MEM_STAT_DECL)");
>> -
>> -  /* Now write out the body of the function itself, which allocates
>> -     the memory and initializes it.  */
>> +  /* Now write out the body of the init function itself.  */
>>   puts ("{");
>> -  puts ("  rtx rt;");
>> -  puts ("  rt = rtx_alloc (code PASS_MEM_STAT);\n");
>> -
>>   puts ("  PUT_MODE_RAW (rt, mode);");
>> 
>>   for (p = format, i = j = 0; *p ; ++p, ++i)
>> @@ -266,16 +261,56 @@ gendef (const char *format)
>>     else
>>       printf ("  %s (rt, %d) = arg%d;\n", accessor_from_format (*p), i, j++);
>> 
>> -  puts ("\n  return rt;\n}\n");
>> +  puts ("}\n");
>> +
>> +  /* Write the definition of the gen function name and the types
>> +     of the arguments.  */
>> +
>> +  puts ("static inline rtx");
>> +  printf ("gen_rtx_fmt_%s_stat (RTX_CODE code, machine_mode mode", format);
>> +  for (p = format, i = 0; *p != 0; p++)
>> +    if (*p != '0')
>> +      printf (",\n\t%sarg%d", type_from_format (*p), i++);
>> +  puts (" MEM_STAT_DECL)");
>> +
>> +  /* Now write out the body of the function itself, which allocates
>> +     the memory and initializes it.  */
>> +  puts ("{");
>> +  puts ("  rtx rt;\n");
>> +
>> +  puts ("  rt = rtx_alloc (code PASS_MEM_STAT);");
>> +  printf ("  init_rtx_fmt_%s (rt, mode", format);
>> +  for (p = format, i = 0; *p != 0; p++)
>> +    if (*p != '0')
>> +      printf (", arg%d", i++);
>> +  puts (");\n");
>> +
>> +  puts ("  return rt;\n}\n");
>> +
>> +  /* Write the definition of gen macro.  */
>> +
>>   printf ("#define gen_rtx_fmt_%s(c, m", format);
>>   for (p = format, i = 0; *p != 0; p++)
>>     if (*p != '0')
>> -      printf (", p%i",i++);
>> -  printf (")\\\n        gen_rtx_fmt_%s_stat (c, m", format);
>> +      printf (", arg%d", i++);
>> +  printf (") \\\n  gen_rtx_fmt_%s_stat ((c), (m)", format);
>>   for (p = format, i = 0; *p != 0; p++)
>>     if (*p != '0')
>> -      printf (", p%i",i++);
>> +      printf (", (arg%d)", i++);
>>   printf (" MEM_STAT_INFO)\n\n");
>> +
>> +  /* Write the definition of alloca macro.  */
>> +
>> +  printf ("#define alloca_rtx_fmt_%s(rt, c, m", format);
>> +  for (p = format, i = 0; *p != 0; p++)
>> +    if (*p != '0')
>> +      printf (", arg%d", i++);
>> +  printf (") \\\n  rtx_alloca ((rt), (c)); \\\n");
>> +  printf ("  init_rtx_fmt_%s ((rt), (m)", format);
>> +  for (p = format, i = 0; *p != 0; p++)
>> +    if (*p != '0')
>> +      printf (", (arg%d)", i++);
>> +  printf (")\n\n");
>> }
>> 
>> /* Generate the documentation header for files we write.  */
>> diff --git a/gcc/rtl.h b/gcc/rtl.h
>> index efb9b3ce40d..44733d8a39e 100644
>> --- a/gcc/rtl.h
>> +++ b/gcc/rtl.h
>> @@ -2933,6 +2933,10 @@ extern HOST_WIDE_INT get_stack_check_protect (void);
>> 
>> /* In rtl.c */
>> extern rtx rtx_alloc (RTX_CODE CXX_MEM_STAT_INFO);
>> +#define rtx_alloca(rt, code) \
>> +  (rt) = (rtx) alloca (RTX_CODE_SIZE ((code))); \
>> +  memset ((rt), 0, RTX_HDR_SIZE); \
>> +  PUT_CODE ((rt), (code));
>> extern rtx rtx_alloc_stat_v (RTX_CODE MEM_STAT_DECL, int);
>> #define rtx_alloc_v(c, SZ) rtx_alloc_stat_v (c MEM_STAT_INFO, SZ)
>> #define const_wide_int_alloc(NWORDS)                           \
>> @@ -3797,7 +3801,11 @@ gen_rtx_INSN (machine_mode mode, rtx_insn *prev_insn, rtx_insn *next_insn,
>> extern rtx gen_rtx_CONST_INT (machine_mode, HOST_WIDE_INT);
>> extern rtx gen_rtx_CONST_VECTOR (machine_mode, rtvec);
>> extern void set_mode_and_regno (rtx, machine_mode, unsigned int);
>> +extern void init_raw_REG (rtx, machine_mode, unsigned int);
>> extern rtx gen_raw_REG (machine_mode, unsigned int);
>> +#define alloca_raw_REG(rt, mode, regno) \
>> +  rtx_alloca ((rt), REG); \
>> +  init_raw_REG ((rt), (mode), (regno))
>> extern rtx gen_rtx_REG (machine_mode, unsigned int);
>> extern rtx gen_rtx_SUBREG (machine_mode, rtx, poly_uint64);
>> extern rtx gen_rtx_MEM (machine_mode, rtx);
>> 
>> which now allows me to write:
>> 
>> rtx reg1, reg2, test;
>> alloca_raw_REG (reg1, cmp_op_mode, LAST_VIRTUAL_REGISTER + 1);
>> alloca_raw_REG (reg2, cmp_op_mode, LAST_VIRTUAL_REGISTER + 2);
>> alloca_rtx_fmt_ee (test, code, value_mode, reg1, reg2);
>> 
>> If that looks ok, I'll resend the series.
> 
> that looks OK to me - please leave Richard S. time to comment.  Also while
> I'd like to see
> 
> rtx reg1 = alloca_raw_REG (cmp_op_mode, LAST_VIRTUAL_REGISTER + 1);
> 
> I don't really see a way to write that portably (or at all), do you all agree?
> GCC doesn't seem to convert alloca() calls to __builtin_stack_save/restore
> nor place CLOBBERs to end their lifetime.  But is it guaranteed that the
> alloca result is valid until frame termination?

Hmm, the alloca man page says:

       The alloca() function allocates size bytes of space in the stack
       frame of the caller.  This temporary space is automatically freed
       when the function that called alloca() returns to its caller.
...
       The space allocated by alloca() is not automatically deallocated if
       the pointer that refers to it simply goes out of scope.

A quick experiment with gcc and clang confirms this.  I think this means
I can make alloca_raw_REG macro return the allocated pointer using the
return-from-block GNU extension.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/9] Introduce can_vector_compare_p function
  2019-08-26 13:46               ` Ilya Leoshkevich
@ 2019-08-26 17:15                 ` Ilya Leoshkevich
  2019-08-27  8:07                   ` Richard Sandiford
  0 siblings, 1 reply; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-08-26 17:15 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Segher Boessenkool, Richard Sandiford



> Am 26.08.2019 um 15:17 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
> 
>> Am 26.08.2019 um 15:06 schrieb Richard Biener <richard.guenther@gmail.com>:
>> 
>> On Mon, Aug 26, 2019 at 1:54 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>>> 
>>>> Am 26.08.2019 um 10:49 schrieb Richard Biener <richard.guenther@gmail.com>:
>>>> 
>>>> On Fri, Aug 23, 2019 at 1:35 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>>>>> 
>>>>>> Am 23.08.2019 um 13:24 schrieb Richard Biener <richard.guenther@gmail.com>:
>>>>>> 
>>>>>> On Fri, Aug 23, 2019 at 12:43 PM Richard Sandiford
>>>>>> <richard.sandiford@arm.com> wrote:
>>>>>>> 
>>>>>>> Ilya Leoshkevich <iii@linux.ibm.com> writes:
>>>>>>>> @@ -3819,6 +3820,82 @@ can_compare_p (enum rtx_code code, machine_mode mode,
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>> 
>>>>>>>> +/* can_vector_compare_p presents fake rtx binary operations to the the back-end
>>>>>>>> +   in order to determine its capabilities.  In order to avoid creating fake
>>>>>>>> +   operations on each call, values from previous calls are cached in a global
>>>>>>>> +   cached_binops hash_table.  It contains rtxes, which can be looked up using
>>>>>>>> +   binop_keys.  */
>>>>>>>> +
>>>>>>>> +struct binop_key {
>>>>>>>> +  enum rtx_code code;        /* Operation code.  */
>>>>>>>> +  machine_mode value_mode;   /* Result mode.     */
>>>>>>>> +  machine_mode cmp_op_mode;  /* Operand mode.    */
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> +struct binop_hasher : pointer_hash_mark<rtx>, ggc_cache_remove<rtx> {
>>>>>>>> +  typedef rtx value_type;
>>>>>>>> +  typedef binop_key compare_type;
>>>>>>>> +
>>>>>>>> +  static hashval_t
>>>>>>>> +  hash (enum rtx_code code, machine_mode value_mode, machine_mode cmp_op_mode)
>>>>>>>> +  {
>>>>>>>> +    inchash::hash hstate (0);
>>>>>>>> +    hstate.add_int (code);
>>>>>>>> +    hstate.add_int (value_mode);
>>>>>>>> +    hstate.add_int (cmp_op_mode);
>>>>>>>> +    return hstate.end ();
>>>>>>>> +  }
>>>>>>>> +
>>>>>>>> +  static hashval_t
>>>>>>>> +  hash (const rtx &ref)
>>>>>>>> +  {
>>>>>>>> +    return hash (GET_CODE (ref), GET_MODE (ref), GET_MODE (XEXP (ref, 0)));
>>>>>>>> +  }
>>>>>>>> +
>>>>>>>> +  static bool
>>>>>>>> +  equal (const rtx &ref1, const binop_key &ref2)
>>>>>>>> +  {
>>>>>>>> +    return (GET_CODE (ref1) == ref2.code)
>>>>>>>> +        && (GET_MODE (ref1) == ref2.value_mode)
>>>>>>>> +        && (GET_MODE (XEXP (ref1, 0)) == ref2.cmp_op_mode);
>>>>>>>> +  }
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> +static GTY ((cache)) hash_table<binop_hasher> *cached_binops;
>>>>>>>> +
>>>>>>>> +static rtx
>>>>>>>> +get_cached_binop (enum rtx_code code, machine_mode value_mode,
>>>>>>>> +               machine_mode cmp_op_mode)
>>>>>>>> +{
>>>>>>>> +  if (!cached_binops)
>>>>>>>> +    cached_binops = hash_table<binop_hasher>::create_ggc (1024);
>>>>>>>> +  binop_key key = { code, value_mode, cmp_op_mode };
>>>>>>>> +  hashval_t hash = binop_hasher::hash (code, value_mode, cmp_op_mode);
>>>>>>>> +  rtx *slot = cached_binops->find_slot_with_hash (key, hash, INSERT);
>>>>>>>> +  if (!*slot)
>>>>>>>> +    *slot = gen_rtx_fmt_ee (code, value_mode, gen_reg_rtx (cmp_op_mode),
>>>>>>>> +                         gen_reg_rtx (cmp_op_mode));
>>>>>>>> +  return *slot;
>>>>>>>> +}
>>>>>>> 
>>>>>>> Sorry, I didn't mean anything this complicated.  I just meant that
>>>>>>> we should have a single cached rtx that we can change via PUT_CODE and
>>>>>>> PUT_MODE_RAW for each new query, rather than allocating a new rtx each
>>>>>>> time.
>>>>>>> 
>>>>>>> Something like:
>>>>>>> 
>>>>>>> static GTY ((cache)) rtx cached_binop;
>>>>>>> 
>>>>>>> rtx
>>>>>>> get_cached_binop (machine_mode mode, rtx_code code, machine_mode op_mode)
>>>>>>> {
>>>>>>> if (cached_binop)
>>>>>>> {
>>>>>>>   PUT_CODE (cached_binop, code);
>>>>>>>   PUT_MODE_RAW (cached_binop, mode);
>>>>>>>   PUT_MODE_RAW (XEXP (cached_binop, 0), op_mode);
>>>>>>>   PUT_MODE_RAW (XEXP (cached_binop, 1), op_mode);
>>>>>>> }
>>>>>>> else
>>>>>>> {
>>>>>>>   rtx reg1 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 1);
>>>>>>>   rtx reg2 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 2);
>>>>>>>   cached_binop = gen_rtx_fmt_ee (code, mode, reg1, reg2);
>>>>>>> }
>>>>>>> return cached_binop;
>>>>>>> }
>>>>>> 
>>>>>> Hmm, maybe we need  auto_rtx (code) that constructs such
>>>>>> RTX on the stack instead of wasting a GC root (and causing
>>>>>> issues for future threading of GCC ;)).
>>>>> 
>>>>> Do you mean something like this?
>>>>> 
>>>>> union {
>>>>> char raw[rtx_code_size[code]];
>>>>> rtx rtx;
>>>>> } binop;
>>>>> 
>>>>> Does this exist already (git grep auto.*rtx / rtx.*auto doesn't show
>>>>> anything useful), or should I implement this?
>>>> 
>>>> It doesn't exist AFAIK, I thought about using alloca like
>>>> 
>>>> rtx tem;
>>>> rtx_alloca (tem, PLUS);
>>>> 
>>>> and due to using alloca rtx_alloca has to be a macro like
>>>> 
>>>> #define rtx_alloca(r, code) r = (rtx)alloca (RTX_CODE_SIZE(code));
>>>> memset (r, 0, RTX_HDR_SIZE); PUT_CODE (r, code);
>>>> 
>>>> maybe C++ can help making this prettier but of course
>>>> since we use alloca we have to avoid opening new scopes.
>>>> 
>>>> I guess templates like with auto_vec doesn't work unless
>>>> we can make RTX_CODE_SIZE constant-evaluated.
>>>> 
>>>> Richard.
>>> 
>>> I ended up with the following change:
>>> 
>>> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
>>> index a667cdab94e..97aa2144e95 100644
>>> --- a/gcc/emit-rtl.c
>>> +++ b/gcc/emit-rtl.c
>>> @@ -466,17 +466,25 @@ set_mode_and_regno (rtx x, machine_mode mode, unsigned int regno)
>>>  set_regno_raw (x, regno, nregs);
>>> }
>>> 
>>> -/* Generate a new REG rtx.  Make sure ORIGINAL_REGNO is set properly, and
>>> +/* Initialize a REG rtx.  Make sure ORIGINAL_REGNO is set properly, and
>>>   don't attempt to share with the various global pieces of rtl (such as
>>>   frame_pointer_rtx).  */
>>> 
>>> -rtx
>>> -gen_raw_REG (machine_mode mode, unsigned int regno)
>>> +void
>>> +init_raw_REG (rtx x, machine_mode mode, unsigned int regno)
>>> {
>>> -  rtx x = rtx_alloc (REG MEM_STAT_INFO);
>>>  set_mode_and_regno (x, mode, regno);
>>>  REG_ATTRS (x) = NULL;
>>>  ORIGINAL_REGNO (x) = regno;
>>> +}
>>> +
>>> +/* Generate a new REG rtx.  */
>>> +
>>> +rtx
>>> +gen_raw_REG (machine_mode mode, unsigned int regno)
>>> +{
>>> +  rtx x = rtx_alloc (REG MEM_STAT_INFO);
>>> +  init_raw_REG (x, mode, regno);
>>>  return x;
>>> }
>>> 
>>> diff --git a/gcc/gengenrtl.c b/gcc/gengenrtl.c
>>> index 5c78fabfb50..bb2087da258 100644
>>> --- a/gcc/gengenrtl.c
>>> +++ b/gcc/gengenrtl.c
>>> @@ -231,8 +231,7 @@ genmacro (int idx)
>>>  puts (")");
>>> }
>>> 
>>> -/* Generate the code for the function to generate RTL whose
>>> -   format is FORMAT.  */
>>> +/* Generate the code for functions to generate RTL whose format is FORMAT.  */
>>> 
>>> static void
>>> gendef (const char *format)
>>> @@ -240,22 +239,18 @@ gendef (const char *format)
>>>  const char *p;
>>>  int i, j;
>>> 
>>> -  /* Start by writing the definition of the function name and the types
>>> +  /* Write the definition of the init function name and the types
>>>     of the arguments.  */
>>> 
>>> -  printf ("static inline rtx\ngen_rtx_fmt_%s_stat (RTX_CODE code, machine_mode mode", format);
>>> +  puts ("static inline void");
>>> +  printf ("init_rtx_fmt_%s (rtx rt, machine_mode mode", format);
>>>  for (p = format, i = 0; *p != 0; p++)
>>>    if (*p != '0')
>>>      printf (",\n\t%sarg%d", type_from_format (*p), i++);
>>> +  puts (")");
>>> 
>>> -  puts (" MEM_STAT_DECL)");
>>> -
>>> -  /* Now write out the body of the function itself, which allocates
>>> -     the memory and initializes it.  */
>>> +  /* Now write out the body of the init function itself.  */
>>>  puts ("{");
>>> -  puts ("  rtx rt;");
>>> -  puts ("  rt = rtx_alloc (code PASS_MEM_STAT);\n");
>>> -
>>>  puts ("  PUT_MODE_RAW (rt, mode);");
>>> 
>>>  for (p = format, i = j = 0; *p ; ++p, ++i)
>>> @@ -266,16 +261,56 @@ gendef (const char *format)
>>>    else
>>>      printf ("  %s (rt, %d) = arg%d;\n", accessor_from_format (*p), i, j++);
>>> 
>>> -  puts ("\n  return rt;\n}\n");
>>> +  puts ("}\n");
>>> +
>>> +  /* Write the definition of the gen function name and the types
>>> +     of the arguments.  */
>>> +
>>> +  puts ("static inline rtx");
>>> +  printf ("gen_rtx_fmt_%s_stat (RTX_CODE code, machine_mode mode", format);
>>> +  for (p = format, i = 0; *p != 0; p++)
>>> +    if (*p != '0')
>>> +      printf (",\n\t%sarg%d", type_from_format (*p), i++);
>>> +  puts (" MEM_STAT_DECL)");
>>> +
>>> +  /* Now write out the body of the function itself, which allocates
>>> +     the memory and initializes it.  */
>>> +  puts ("{");
>>> +  puts ("  rtx rt;\n");
>>> +
>>> +  puts ("  rt = rtx_alloc (code PASS_MEM_STAT);");
>>> +  printf ("  init_rtx_fmt_%s (rt, mode", format);
>>> +  for (p = format, i = 0; *p != 0; p++)
>>> +    if (*p != '0')
>>> +      printf (", arg%d", i++);
>>> +  puts (");\n");
>>> +
>>> +  puts ("  return rt;\n}\n");
>>> +
>>> +  /* Write the definition of gen macro.  */
>>> +
>>>  printf ("#define gen_rtx_fmt_%s(c, m", format);
>>>  for (p = format, i = 0; *p != 0; p++)
>>>    if (*p != '0')
>>> -      printf (", p%i",i++);
>>> -  printf (")\\\n        gen_rtx_fmt_%s_stat (c, m", format);
>>> +      printf (", arg%d", i++);
>>> +  printf (") \\\n  gen_rtx_fmt_%s_stat ((c), (m)", format);
>>>  for (p = format, i = 0; *p != 0; p++)
>>>    if (*p != '0')
>>> -      printf (", p%i",i++);
>>> +      printf (", (arg%d)", i++);
>>>  printf (" MEM_STAT_INFO)\n\n");
>>> +
>>> +  /* Write the definition of alloca macro.  */
>>> +
>>> +  printf ("#define alloca_rtx_fmt_%s(rt, c, m", format);
>>> +  for (p = format, i = 0; *p != 0; p++)
>>> +    if (*p != '0')
>>> +      printf (", arg%d", i++);
>>> +  printf (") \\\n  rtx_alloca ((rt), (c)); \\\n");
>>> +  printf ("  init_rtx_fmt_%s ((rt), (m)", format);
>>> +  for (p = format, i = 0; *p != 0; p++)
>>> +    if (*p != '0')
>>> +      printf (", (arg%d)", i++);
>>> +  printf (")\n\n");
>>> }
>>> 
>>> /* Generate the documentation header for files we write.  */
>>> diff --git a/gcc/rtl.h b/gcc/rtl.h
>>> index efb9b3ce40d..44733d8a39e 100644
>>> --- a/gcc/rtl.h
>>> +++ b/gcc/rtl.h
>>> @@ -2933,6 +2933,10 @@ extern HOST_WIDE_INT get_stack_check_protect (void);
>>> 
>>> /* In rtl.c */
>>> extern rtx rtx_alloc (RTX_CODE CXX_MEM_STAT_INFO);
>>> +#define rtx_alloca(rt, code) \
>>> +  (rt) = (rtx) alloca (RTX_CODE_SIZE ((code))); \
>>> +  memset ((rt), 0, RTX_HDR_SIZE); \
>>> +  PUT_CODE ((rt), (code));
>>> extern rtx rtx_alloc_stat_v (RTX_CODE MEM_STAT_DECL, int);
>>> #define rtx_alloc_v(c, SZ) rtx_alloc_stat_v (c MEM_STAT_INFO, SZ)
>>> #define const_wide_int_alloc(NWORDS)                           \
>>> @@ -3797,7 +3801,11 @@ gen_rtx_INSN (machine_mode mode, rtx_insn *prev_insn, rtx_insn *next_insn,
>>> extern rtx gen_rtx_CONST_INT (machine_mode, HOST_WIDE_INT);
>>> extern rtx gen_rtx_CONST_VECTOR (machine_mode, rtvec);
>>> extern void set_mode_and_regno (rtx, machine_mode, unsigned int);
>>> +extern void init_raw_REG (rtx, machine_mode, unsigned int);
>>> extern rtx gen_raw_REG (machine_mode, unsigned int);
>>> +#define alloca_raw_REG(rt, mode, regno) \
>>> +  rtx_alloca ((rt), REG); \
>>> +  init_raw_REG ((rt), (mode), (regno))
>>> extern rtx gen_rtx_REG (machine_mode, unsigned int);
>>> extern rtx gen_rtx_SUBREG (machine_mode, rtx, poly_uint64);
>>> extern rtx gen_rtx_MEM (machine_mode, rtx);
>>> 
>>> which now allows me to write:
>>> 
>>> rtx reg1, reg2, test;
>>> alloca_raw_REG (reg1, cmp_op_mode, LAST_VIRTUAL_REGISTER + 1);
>>> alloca_raw_REG (reg2, cmp_op_mode, LAST_VIRTUAL_REGISTER + 2);
>>> alloca_rtx_fmt_ee (test, code, value_mode, reg1, reg2);
>>> 
>>> If that looks ok, I'll resend the series.
>> 
>> that looks OK to me - please leave Richard S. time to comment.  Also while
>> I'd like to see
>> 
>> rtx reg1 = alloca_raw_REG (cmp_op_mode, LAST_VIRTUAL_REGISTER + 1);
>> 
>> I don't really see a way to write that portably (or at all), do you all agree?
>> GCC doesn't seem to convert alloca() calls to __builtin_stack_save/restore
>> nor place CLOBBERs to end their lifetime.  But is it guaranteed that the
>> alloca result is valid until frame termination?
> 
> Hmm, the alloca man page says:
> 
>       The alloca() function allocates size bytes of space in the stack
>       frame of the caller.  This temporary space is automatically freed
>       when the function that called alloca() returns to its caller.
> ...
>       The space allocated by alloca() is not automatically deallocated if
>       the pointer that refers to it simply goes out of scope.
> 
> A quick experiment with gcc and clang confirms this.  I think this means
> I can make alloca_raw_REG macro return the allocated pointer using the
> return-from-block GNU extension.

What do you think about the following approach?

extern rtx rtx_init (rtx, RTX_CODE);
#define rtx_alloca(code) \
  rtx_init ((rtx) alloca (RTX_CODE_SIZE ((code))), (code))

...

rtx
rtx_init (rtx rt, RTX_CODE code)
{
  /* We want to clear everything up to the FLD array.  Normally, this
     is one int, but we don't want to assume that and it isn't very
     portable anyway; this is.  */
  memset (rt, 0, RTX_HDR_SIZE);
  PUT_CODE (rt, code);
  return rt;
}

with similar changes to alloca_raw_REG and gengenrtl.  This way we don't
even need a GNU extension.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/9] Introduce can_vector_compare_p function
  2019-08-26 17:15                 ` Ilya Leoshkevich
@ 2019-08-27  8:07                   ` Richard Sandiford
  2019-08-27 12:27                     ` Richard Biener
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Sandiford @ 2019-08-27  8:07 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: Richard Biener, GCC Patches, Segher Boessenkool

Ilya Leoshkevich <iii@linux.ibm.com> writes:
>> Am 26.08.2019 um 15:17 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
>> 
>>> Am 26.08.2019 um 15:06 schrieb Richard Biener <richard.guenther@gmail.com>:
>>> 
>>> On Mon, Aug 26, 2019 at 1:54 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>>>> 
>>>>> Am 26.08.2019 um 10:49 schrieb Richard Biener <richard.guenther@gmail.com>:
>>>>> 
>>>>> On Fri, Aug 23, 2019 at 1:35 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>>>>>> 
>>>>>>> Am 23.08.2019 um 13:24 schrieb Richard Biener <richard.guenther@gmail.com>:
>>>>>>> 
>>>>>>> On Fri, Aug 23, 2019 at 12:43 PM Richard Sandiford
>>>>>>> <richard.sandiford@arm.com> wrote:
>>>>>>>> 
>>>>>>>> Ilya Leoshkevich <iii@linux.ibm.com> writes:
>>>>>>>>> @@ -3819,6 +3820,82 @@ can_compare_p (enum rtx_code code, machine_mode mode,
>>>>>>>>> return 0;
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> +/* can_vector_compare_p presents fake rtx binary operations to the the back-end
>>>>>>>>> +   in order to determine its capabilities.  In order to avoid creating fake
>>>>>>>>> +   operations on each call, values from previous calls are cached in a global
>>>>>>>>> +   cached_binops hash_table.  It contains rtxes, which can be looked up using
>>>>>>>>> +   binop_keys.  */
>>>>>>>>> +
>>>>>>>>> +struct binop_key {
>>>>>>>>> +  enum rtx_code code;        /* Operation code.  */
>>>>>>>>> +  machine_mode value_mode;   /* Result mode.     */
>>>>>>>>> +  machine_mode cmp_op_mode;  /* Operand mode.    */
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +struct binop_hasher : pointer_hash_mark<rtx>, ggc_cache_remove<rtx> {
>>>>>>>>> +  typedef rtx value_type;
>>>>>>>>> +  typedef binop_key compare_type;
>>>>>>>>> +
>>>>>>>>> +  static hashval_t
>>>>>>>>> +  hash (enum rtx_code code, machine_mode value_mode, machine_mode cmp_op_mode)
>>>>>>>>> +  {
>>>>>>>>> +    inchash::hash hstate (0);
>>>>>>>>> +    hstate.add_int (code);
>>>>>>>>> +    hstate.add_int (value_mode);
>>>>>>>>> +    hstate.add_int (cmp_op_mode);
>>>>>>>>> +    return hstate.end ();
>>>>>>>>> +  }
>>>>>>>>> +
>>>>>>>>> +  static hashval_t
>>>>>>>>> +  hash (const rtx &ref)
>>>>>>>>> +  {
>>>>>>>>> +    return hash (GET_CODE (ref), GET_MODE (ref), GET_MODE (XEXP (ref, 0)));
>>>>>>>>> +  }
>>>>>>>>> +
>>>>>>>>> +  static bool
>>>>>>>>> +  equal (const rtx &ref1, const binop_key &ref2)
>>>>>>>>> +  {
>>>>>>>>> +    return (GET_CODE (ref1) == ref2.code)
>>>>>>>>> +        && (GET_MODE (ref1) == ref2.value_mode)
>>>>>>>>> +        && (GET_MODE (XEXP (ref1, 0)) == ref2.cmp_op_mode);
>>>>>>>>> +  }
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +static GTY ((cache)) hash_table<binop_hasher> *cached_binops;
>>>>>>>>> +
>>>>>>>>> +static rtx
>>>>>>>>> +get_cached_binop (enum rtx_code code, machine_mode value_mode,
>>>>>>>>> +               machine_mode cmp_op_mode)
>>>>>>>>> +{
>>>>>>>>> +  if (!cached_binops)
>>>>>>>>> +    cached_binops = hash_table<binop_hasher>::create_ggc (1024);
>>>>>>>>> +  binop_key key = { code, value_mode, cmp_op_mode };
>>>>>>>>> +  hashval_t hash = binop_hasher::hash (code, value_mode, cmp_op_mode);
>>>>>>>>> +  rtx *slot = cached_binops->find_slot_with_hash (key, hash, INSERT);
>>>>>>>>> +  if (!*slot)
>>>>>>>>> +    *slot = gen_rtx_fmt_ee (code, value_mode, gen_reg_rtx (cmp_op_mode),
>>>>>>>>> +                         gen_reg_rtx (cmp_op_mode));
>>>>>>>>> +  return *slot;
>>>>>>>>> +}
>>>>>>>> 
>>>>>>>> Sorry, I didn't mean anything this complicated.  I just meant that
>>>>>>>> we should have a single cached rtx that we can change via PUT_CODE and
>>>>>>>> PUT_MODE_RAW for each new query, rather than allocating a new rtx each
>>>>>>>> time.
>>>>>>>> 
>>>>>>>> Something like:
>>>>>>>> 
>>>>>>>> static GTY ((cache)) rtx cached_binop;
>>>>>>>> 
>>>>>>>> rtx
>>>>>>>> get_cached_binop (machine_mode mode, rtx_code code, machine_mode op_mode)
>>>>>>>> {
>>>>>>>> if (cached_binop)
>>>>>>>> {
>>>>>>>>   PUT_CODE (cached_binop, code);
>>>>>>>>   PUT_MODE_RAW (cached_binop, mode);
>>>>>>>>   PUT_MODE_RAW (XEXP (cached_binop, 0), op_mode);
>>>>>>>>   PUT_MODE_RAW (XEXP (cached_binop, 1), op_mode);
>>>>>>>> }
>>>>>>>> else
>>>>>>>> {
>>>>>>>>   rtx reg1 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 1);
>>>>>>>>   rtx reg2 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 2);
>>>>>>>>   cached_binop = gen_rtx_fmt_ee (code, mode, reg1, reg2);
>>>>>>>> }
>>>>>>>> return cached_binop;
>>>>>>>> }
>>>>>>> 
>>>>>>> Hmm, maybe we need  auto_rtx (code) that constructs such
>>>>>>> RTX on the stack instead of wasting a GC root (and causing
>>>>>>> issues for future threading of GCC ;)).
>>>>>> 
>>>>>> Do you mean something like this?
>>>>>> 
>>>>>> union {
>>>>>> char raw[rtx_code_size[code]];
>>>>>> rtx rtx;
>>>>>> } binop;
>>>>>> 
>>>>>> Does this exist already (git grep auto.*rtx / rtx.*auto doesn't show
>>>>>> anything useful), or should I implement this?
>>>>> 
>>>>> It doesn't exist AFAIK, I thought about using alloca like
>>>>> 
>>>>> rtx tem;
>>>>> rtx_alloca (tem, PLUS);
>>>>> 
>>>>> and due to using alloca rtx_alloca has to be a macro like
>>>>> 
>>>>> #define rtx_alloca(r, code) r = (rtx)alloca (RTX_CODE_SIZE(code));
>>>>> memset (r, 0, RTX_HDR_SIZE); PUT_CODE (r, code);
>>>>> 
>>>>> maybe C++ can help making this prettier but of course
>>>>> since we use alloca we have to avoid opening new scopes.
>>>>> 
>>>>> I guess templates like with auto_vec doesn't work unless
>>>>> we can make RTX_CODE_SIZE constant-evaluated.
>>>>> 
>>>>> Richard.
>>>> 
>>>> I ended up with the following change:
>>>> 
>>>> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
>>>> index a667cdab94e..97aa2144e95 100644
>>>> --- a/gcc/emit-rtl.c
>>>> +++ b/gcc/emit-rtl.c
>>>> @@ -466,17 +466,25 @@ set_mode_and_regno (rtx x, machine_mode mode, unsigned int regno)
>>>>  set_regno_raw (x, regno, nregs);
>>>> }
>>>> 
>>>> -/* Generate a new REG rtx.  Make sure ORIGINAL_REGNO is set properly, and
>>>> +/* Initialize a REG rtx.  Make sure ORIGINAL_REGNO is set properly, and
>>>>   don't attempt to share with the various global pieces of rtl (such as
>>>>   frame_pointer_rtx).  */
>>>> 
>>>> -rtx
>>>> -gen_raw_REG (machine_mode mode, unsigned int regno)
>>>> +void
>>>> +init_raw_REG (rtx x, machine_mode mode, unsigned int regno)
>>>> {
>>>> -  rtx x = rtx_alloc (REG MEM_STAT_INFO);
>>>>  set_mode_and_regno (x, mode, regno);
>>>>  REG_ATTRS (x) = NULL;
>>>>  ORIGINAL_REGNO (x) = regno;
>>>> +}
>>>> +
>>>> +/* Generate a new REG rtx.  */
>>>> +
>>>> +rtx
>>>> +gen_raw_REG (machine_mode mode, unsigned int regno)
>>>> +{
>>>> +  rtx x = rtx_alloc (REG MEM_STAT_INFO);
>>>> +  init_raw_REG (x, mode, regno);
>>>>  return x;
>>>> }
>>>> 
>>>> diff --git a/gcc/gengenrtl.c b/gcc/gengenrtl.c
>>>> index 5c78fabfb50..bb2087da258 100644
>>>> --- a/gcc/gengenrtl.c
>>>> +++ b/gcc/gengenrtl.c
>>>> @@ -231,8 +231,7 @@ genmacro (int idx)
>>>>  puts (")");
>>>> }
>>>> 
>>>> -/* Generate the code for the function to generate RTL whose
>>>> -   format is FORMAT.  */
>>>> +/* Generate the code for functions to generate RTL whose format is FORMAT.  */
>>>> 
>>>> static void
>>>> gendef (const char *format)
>>>> @@ -240,22 +239,18 @@ gendef (const char *format)
>>>>  const char *p;
>>>>  int i, j;
>>>> 
>>>> -  /* Start by writing the definition of the function name and the types
>>>> +  /* Write the definition of the init function name and the types
>>>>     of the arguments.  */
>>>> 
>>>> -  printf ("static inline rtx\ngen_rtx_fmt_%s_stat (RTX_CODE code, machine_mode mode", format);
>>>> +  puts ("static inline void");
>>>> +  printf ("init_rtx_fmt_%s (rtx rt, machine_mode mode", format);
>>>>  for (p = format, i = 0; *p != 0; p++)
>>>>    if (*p != '0')
>>>>      printf (",\n\t%sarg%d", type_from_format (*p), i++);
>>>> +  puts (")");
>>>> 
>>>> -  puts (" MEM_STAT_DECL)");
>>>> -
>>>> -  /* Now write out the body of the function itself, which allocates
>>>> -     the memory and initializes it.  */
>>>> +  /* Now write out the body of the init function itself.  */
>>>>  puts ("{");
>>>> -  puts ("  rtx rt;");
>>>> -  puts ("  rt = rtx_alloc (code PASS_MEM_STAT);\n");
>>>> -
>>>>  puts ("  PUT_MODE_RAW (rt, mode);");
>>>> 
>>>>  for (p = format, i = j = 0; *p ; ++p, ++i)
>>>> @@ -266,16 +261,56 @@ gendef (const char *format)
>>>>    else
>>>>      printf ("  %s (rt, %d) = arg%d;\n", accessor_from_format (*p), i, j++);
>>>> 
>>>> -  puts ("\n  return rt;\n}\n");
>>>> +  puts ("}\n");
>>>> +
>>>> +  /* Write the definition of the gen function name and the types
>>>> +     of the arguments.  */
>>>> +
>>>> +  puts ("static inline rtx");
>>>> +  printf ("gen_rtx_fmt_%s_stat (RTX_CODE code, machine_mode mode", format);
>>>> +  for (p = format, i = 0; *p != 0; p++)
>>>> +    if (*p != '0')
>>>> +      printf (",\n\t%sarg%d", type_from_format (*p), i++);
>>>> +  puts (" MEM_STAT_DECL)");
>>>> +
>>>> +  /* Now write out the body of the function itself, which allocates
>>>> +     the memory and initializes it.  */
>>>> +  puts ("{");
>>>> +  puts ("  rtx rt;\n");
>>>> +
>>>> +  puts ("  rt = rtx_alloc (code PASS_MEM_STAT);");
>>>> +  printf ("  init_rtx_fmt_%s (rt, mode", format);
>>>> +  for (p = format, i = 0; *p != 0; p++)
>>>> +    if (*p != '0')
>>>> +      printf (", arg%d", i++);
>>>> +  puts (");\n");
>>>> +
>>>> +  puts ("  return rt;\n}\n");
>>>> +
>>>> +  /* Write the definition of gen macro.  */
>>>> +
>>>>  printf ("#define gen_rtx_fmt_%s(c, m", format);
>>>>  for (p = format, i = 0; *p != 0; p++)
>>>>    if (*p != '0')
>>>> -      printf (", p%i",i++);
>>>> -  printf (")\\\n        gen_rtx_fmt_%s_stat (c, m", format);
>>>> +      printf (", arg%d", i++);
>>>> +  printf (") \\\n  gen_rtx_fmt_%s_stat ((c), (m)", format);
>>>>  for (p = format, i = 0; *p != 0; p++)
>>>>    if (*p != '0')
>>>> -      printf (", p%i",i++);
>>>> +      printf (", (arg%d)", i++);
>>>>  printf (" MEM_STAT_INFO)\n\n");
>>>> +
>>>> +  /* Write the definition of alloca macro.  */
>>>> +
>>>> +  printf ("#define alloca_rtx_fmt_%s(rt, c, m", format);
>>>> +  for (p = format, i = 0; *p != 0; p++)
>>>> +    if (*p != '0')
>>>> +      printf (", arg%d", i++);
>>>> +  printf (") \\\n  rtx_alloca ((rt), (c)); \\\n");
>>>> +  printf ("  init_rtx_fmt_%s ((rt), (m)", format);
>>>> +  for (p = format, i = 0; *p != 0; p++)
>>>> +    if (*p != '0')
>>>> +      printf (", (arg%d)", i++);
>>>> +  printf (")\n\n");
>>>> }
>>>> 
>>>> /* Generate the documentation header for files we write.  */
>>>> diff --git a/gcc/rtl.h b/gcc/rtl.h
>>>> index efb9b3ce40d..44733d8a39e 100644
>>>> --- a/gcc/rtl.h
>>>> +++ b/gcc/rtl.h
>>>> @@ -2933,6 +2933,10 @@ extern HOST_WIDE_INT get_stack_check_protect (void);
>>>> 
>>>> /* In rtl.c */
>>>> extern rtx rtx_alloc (RTX_CODE CXX_MEM_STAT_INFO);
>>>> +#define rtx_alloca(rt, code) \
>>>> +  (rt) = (rtx) alloca (RTX_CODE_SIZE ((code))); \
>>>> +  memset ((rt), 0, RTX_HDR_SIZE); \
>>>> +  PUT_CODE ((rt), (code));
>>>> extern rtx rtx_alloc_stat_v (RTX_CODE MEM_STAT_DECL, int);
>>>> #define rtx_alloc_v(c, SZ) rtx_alloc_stat_v (c MEM_STAT_INFO, SZ)
>>>> #define const_wide_int_alloc(NWORDS)                           \
>>>> @@ -3797,7 +3801,11 @@ gen_rtx_INSN (machine_mode mode, rtx_insn *prev_insn, rtx_insn *next_insn,
>>>> extern rtx gen_rtx_CONST_INT (machine_mode, HOST_WIDE_INT);
>>>> extern rtx gen_rtx_CONST_VECTOR (machine_mode, rtvec);
>>>> extern void set_mode_and_regno (rtx, machine_mode, unsigned int);
>>>> +extern void init_raw_REG (rtx, machine_mode, unsigned int);
>>>> extern rtx gen_raw_REG (machine_mode, unsigned int);
>>>> +#define alloca_raw_REG(rt, mode, regno) \
>>>> +  rtx_alloca ((rt), REG); \
>>>> +  init_raw_REG ((rt), (mode), (regno))
>>>> extern rtx gen_rtx_REG (machine_mode, unsigned int);
>>>> extern rtx gen_rtx_SUBREG (machine_mode, rtx, poly_uint64);
>>>> extern rtx gen_rtx_MEM (machine_mode, rtx);
>>>> 
>>>> which now allows me to write:
>>>> 
>>>> rtx reg1, reg2, test;
>>>> alloca_raw_REG (reg1, cmp_op_mode, LAST_VIRTUAL_REGISTER + 1);
>>>> alloca_raw_REG (reg2, cmp_op_mode, LAST_VIRTUAL_REGISTER + 2);
>>>> alloca_rtx_fmt_ee (test, code, value_mode, reg1, reg2);
>>>> 
>>>> If that looks ok, I'll resend the series.
>>> 
>>> that looks OK to me - please leave Richard S. time to comment.  Also while
>>> I'd like to see
>>> 
>>> rtx reg1 = alloca_raw_REG (cmp_op_mode, LAST_VIRTUAL_REGISTER + 1);
>>> 
>>> I don't really see a way to write that portably (or at all), do you all agree?
>>> GCC doesn't seem to convert alloca() calls to __builtin_stack_save/restore
>>> nor place CLOBBERs to end their lifetime.  But is it guaranteed that the
>>> alloca result is valid until frame termination?
>> 
>> Hmm, the alloca man page says:
>> 
>>       The alloca() function allocates size bytes of space in the stack
>>       frame of the caller.  This temporary space is automatically freed
>>       when the function that called alloca() returns to its caller.
>> ...
>>       The space allocated by alloca() is not automatically deallocated if
>>       the pointer that refers to it simply goes out of scope.
>> 
>> A quick experiment with gcc and clang confirms this.  I think this means
>> I can make alloca_raw_REG macro return the allocated pointer using the
>> return-from-block GNU extension.
>
> What do you think about the following approach?
>
> extern rtx rtx_init (rtx, RTX_CODE);
> #define rtx_alloca(code) \
>   rtx_init ((rtx) alloca (RTX_CODE_SIZE ((code))), (code))
>
> ...
>
> rtx
> rtx_init (rtx rt, RTX_CODE code)
> {
>   /* We want to clear everything up to the FLD array.  Normally, this
>      is one int, but we don't want to assume that and it isn't very
>      portable anyway; this is.  */
>   memset (rt, 0, RTX_HDR_SIZE);

Might as well remove this comment while you're there.  RTX_HDR_SIZE
has long ceased to be sizeof (int) for any host.

>   PUT_CODE (rt, code);
>   return rt;
> }
>
> with similar changes to alloca_raw_REG and gengenrtl.  This way we don't
> even need a GNU extension.

Sounds good to me, but rtx_init should probably be an inline function.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/9] Introduce can_vector_compare_p function
  2019-08-27  8:07                   ` Richard Sandiford
@ 2019-08-27 12:27                     ` Richard Biener
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Biener @ 2019-08-27 12:27 UTC (permalink / raw)
  To: Ilya Leoshkevich, Richard Biener, GCC Patches,
	Segher Boessenkool, Richard Sandiford

On Tue, Aug 27, 2019 at 9:01 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Ilya Leoshkevich <iii@linux.ibm.com> writes:
> >> Am 26.08.2019 um 15:17 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
> >>
> >>> Am 26.08.2019 um 15:06 schrieb Richard Biener <richard.guenther@gmail.com>:
> >>>
> >>> On Mon, Aug 26, 2019 at 1:54 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >>>>
> >>>>> Am 26.08.2019 um 10:49 schrieb Richard Biener <richard.guenther@gmail.com>:
> >>>>>
> >>>>> On Fri, Aug 23, 2019 at 1:35 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >>>>>>
> >>>>>>> Am 23.08.2019 um 13:24 schrieb Richard Biener <richard.guenther@gmail.com>:
> >>>>>>>
> >>>>>>> On Fri, Aug 23, 2019 at 12:43 PM Richard Sandiford
> >>>>>>> <richard.sandiford@arm.com> wrote:
> >>>>>>>>
> >>>>>>>> Ilya Leoshkevich <iii@linux.ibm.com> writes:
> >>>>>>>>> @@ -3819,6 +3820,82 @@ can_compare_p (enum rtx_code code, machine_mode mode,
> >>>>>>>>> return 0;
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> +/* can_vector_compare_p presents fake rtx binary operations to the the back-end
> >>>>>>>>> +   in order to determine its capabilities.  In order to avoid creating fake
> >>>>>>>>> +   operations on each call, values from previous calls are cached in a global
> >>>>>>>>> +   cached_binops hash_table.  It contains rtxes, which can be looked up using
> >>>>>>>>> +   binop_keys.  */
> >>>>>>>>> +
> >>>>>>>>> +struct binop_key {
> >>>>>>>>> +  enum rtx_code code;        /* Operation code.  */
> >>>>>>>>> +  machine_mode value_mode;   /* Result mode.     */
> >>>>>>>>> +  machine_mode cmp_op_mode;  /* Operand mode.    */
> >>>>>>>>> +};
> >>>>>>>>> +
> >>>>>>>>> +struct binop_hasher : pointer_hash_mark<rtx>, ggc_cache_remove<rtx> {
> >>>>>>>>> +  typedef rtx value_type;
> >>>>>>>>> +  typedef binop_key compare_type;
> >>>>>>>>> +
> >>>>>>>>> +  static hashval_t
> >>>>>>>>> +  hash (enum rtx_code code, machine_mode value_mode, machine_mode cmp_op_mode)
> >>>>>>>>> +  {
> >>>>>>>>> +    inchash::hash hstate (0);
> >>>>>>>>> +    hstate.add_int (code);
> >>>>>>>>> +    hstate.add_int (value_mode);
> >>>>>>>>> +    hstate.add_int (cmp_op_mode);
> >>>>>>>>> +    return hstate.end ();
> >>>>>>>>> +  }
> >>>>>>>>> +
> >>>>>>>>> +  static hashval_t
> >>>>>>>>> +  hash (const rtx &ref)
> >>>>>>>>> +  {
> >>>>>>>>> +    return hash (GET_CODE (ref), GET_MODE (ref), GET_MODE (XEXP (ref, 0)));
> >>>>>>>>> +  }
> >>>>>>>>> +
> >>>>>>>>> +  static bool
> >>>>>>>>> +  equal (const rtx &ref1, const binop_key &ref2)
> >>>>>>>>> +  {
> >>>>>>>>> +    return (GET_CODE (ref1) == ref2.code)
> >>>>>>>>> +        && (GET_MODE (ref1) == ref2.value_mode)
> >>>>>>>>> +        && (GET_MODE (XEXP (ref1, 0)) == ref2.cmp_op_mode);
> >>>>>>>>> +  }
> >>>>>>>>> +};
> >>>>>>>>> +
> >>>>>>>>> +static GTY ((cache)) hash_table<binop_hasher> *cached_binops;
> >>>>>>>>> +
> >>>>>>>>> +static rtx
> >>>>>>>>> +get_cached_binop (enum rtx_code code, machine_mode value_mode,
> >>>>>>>>> +               machine_mode cmp_op_mode)
> >>>>>>>>> +{
> >>>>>>>>> +  if (!cached_binops)
> >>>>>>>>> +    cached_binops = hash_table<binop_hasher>::create_ggc (1024);
> >>>>>>>>> +  binop_key key = { code, value_mode, cmp_op_mode };
> >>>>>>>>> +  hashval_t hash = binop_hasher::hash (code, value_mode, cmp_op_mode);
> >>>>>>>>> +  rtx *slot = cached_binops->find_slot_with_hash (key, hash, INSERT);
> >>>>>>>>> +  if (!*slot)
> >>>>>>>>> +    *slot = gen_rtx_fmt_ee (code, value_mode, gen_reg_rtx (cmp_op_mode),
> >>>>>>>>> +                         gen_reg_rtx (cmp_op_mode));
> >>>>>>>>> +  return *slot;
> >>>>>>>>> +}
> >>>>>>>>
> >>>>>>>> Sorry, I didn't mean anything this complicated.  I just meant that
> >>>>>>>> we should have a single cached rtx that we can change via PUT_CODE and
> >>>>>>>> PUT_MODE_RAW for each new query, rather than allocating a new rtx each
> >>>>>>>> time.
> >>>>>>>>
> >>>>>>>> Something like:
> >>>>>>>>
> >>>>>>>> static GTY ((cache)) rtx cached_binop;
> >>>>>>>>
> >>>>>>>> rtx
> >>>>>>>> get_cached_binop (machine_mode mode, rtx_code code, machine_mode op_mode)
> >>>>>>>> {
> >>>>>>>> if (cached_binop)
> >>>>>>>> {
> >>>>>>>>   PUT_CODE (cached_binop, code);
> >>>>>>>>   PUT_MODE_RAW (cached_binop, mode);
> >>>>>>>>   PUT_MODE_RAW (XEXP (cached_binop, 0), op_mode);
> >>>>>>>>   PUT_MODE_RAW (XEXP (cached_binop, 1), op_mode);
> >>>>>>>> }
> >>>>>>>> else
> >>>>>>>> {
> >>>>>>>>   rtx reg1 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 1);
> >>>>>>>>   rtx reg2 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 2);
> >>>>>>>>   cached_binop = gen_rtx_fmt_ee (code, mode, reg1, reg2);
> >>>>>>>> }
> >>>>>>>> return cached_binop;
> >>>>>>>> }
> >>>>>>>
> >>>>>>> Hmm, maybe we need  auto_rtx (code) that constructs such
> >>>>>>> RTX on the stack instead of wasting a GC root (and causing
> >>>>>>> issues for future threading of GCC ;)).
> >>>>>>
> >>>>>> Do you mean something like this?
> >>>>>>
> >>>>>> union {
> >>>>>> char raw[rtx_code_size[code]];
> >>>>>> rtx rtx;
> >>>>>> } binop;
> >>>>>>
> >>>>>> Does this exist already (git grep auto.*rtx / rtx.*auto doesn't show
> >>>>>> anything useful), or should I implement this?
> >>>>>
> >>>>> It doesn't exist AFAIK, I thought about using alloca like
> >>>>>
> >>>>> rtx tem;
> >>>>> rtx_alloca (tem, PLUS);
> >>>>>
> >>>>> and due to using alloca rtx_alloca has to be a macro like
> >>>>>
> >>>>> #define rtx_alloca(r, code) r = (rtx)alloca (RTX_CODE_SIZE(code));
> >>>>> memset (r, 0, RTX_HDR_SIZE); PUT_CODE (r, code);
> >>>>>
> >>>>> maybe C++ can help making this prettier but of course
> >>>>> since we use alloca we have to avoid opening new scopes.
> >>>>>
> >>>>> I guess templates like with auto_vec doesn't work unless
> >>>>> we can make RTX_CODE_SIZE constant-evaluated.
> >>>>>
> >>>>> Richard.
> >>>>
> >>>> I ended up with the following change:
> >>>>
> >>>> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> >>>> index a667cdab94e..97aa2144e95 100644
> >>>> --- a/gcc/emit-rtl.c
> >>>> +++ b/gcc/emit-rtl.c
> >>>> @@ -466,17 +466,25 @@ set_mode_and_regno (rtx x, machine_mode mode, unsigned int regno)
> >>>>  set_regno_raw (x, regno, nregs);
> >>>> }
> >>>>
> >>>> -/* Generate a new REG rtx.  Make sure ORIGINAL_REGNO is set properly, and
> >>>> +/* Initialize a REG rtx.  Make sure ORIGINAL_REGNO is set properly, and
> >>>>   don't attempt to share with the various global pieces of rtl (such as
> >>>>   frame_pointer_rtx).  */
> >>>>
> >>>> -rtx
> >>>> -gen_raw_REG (machine_mode mode, unsigned int regno)
> >>>> +void
> >>>> +init_raw_REG (rtx x, machine_mode mode, unsigned int regno)
> >>>> {
> >>>> -  rtx x = rtx_alloc (REG MEM_STAT_INFO);
> >>>>  set_mode_and_regno (x, mode, regno);
> >>>>  REG_ATTRS (x) = NULL;
> >>>>  ORIGINAL_REGNO (x) = regno;
> >>>> +}
> >>>> +
> >>>> +/* Generate a new REG rtx.  */
> >>>> +
> >>>> +rtx
> >>>> +gen_raw_REG (machine_mode mode, unsigned int regno)
> >>>> +{
> >>>> +  rtx x = rtx_alloc (REG MEM_STAT_INFO);
> >>>> +  init_raw_REG (x, mode, regno);
> >>>>  return x;
> >>>> }
> >>>>
> >>>> diff --git a/gcc/gengenrtl.c b/gcc/gengenrtl.c
> >>>> index 5c78fabfb50..bb2087da258 100644
> >>>> --- a/gcc/gengenrtl.c
> >>>> +++ b/gcc/gengenrtl.c
> >>>> @@ -231,8 +231,7 @@ genmacro (int idx)
> >>>>  puts (")");
> >>>> }
> >>>>
> >>>> -/* Generate the code for the function to generate RTL whose
> >>>> -   format is FORMAT.  */
> >>>> +/* Generate the code for functions to generate RTL whose format is FORMAT.  */
> >>>>
> >>>> static void
> >>>> gendef (const char *format)
> >>>> @@ -240,22 +239,18 @@ gendef (const char *format)
> >>>>  const char *p;
> >>>>  int i, j;
> >>>>
> >>>> -  /* Start by writing the definition of the function name and the types
> >>>> +  /* Write the definition of the init function name and the types
> >>>>     of the arguments.  */
> >>>>
> >>>> -  printf ("static inline rtx\ngen_rtx_fmt_%s_stat (RTX_CODE code, machine_mode mode", format);
> >>>> +  puts ("static inline void");
> >>>> +  printf ("init_rtx_fmt_%s (rtx rt, machine_mode mode", format);
> >>>>  for (p = format, i = 0; *p != 0; p++)
> >>>>    if (*p != '0')
> >>>>      printf (",\n\t%sarg%d", type_from_format (*p), i++);
> >>>> +  puts (")");
> >>>>
> >>>> -  puts (" MEM_STAT_DECL)");
> >>>> -
> >>>> -  /* Now write out the body of the function itself, which allocates
> >>>> -     the memory and initializes it.  */
> >>>> +  /* Now write out the body of the init function itself.  */
> >>>>  puts ("{");
> >>>> -  puts ("  rtx rt;");
> >>>> -  puts ("  rt = rtx_alloc (code PASS_MEM_STAT);\n");
> >>>> -
> >>>>  puts ("  PUT_MODE_RAW (rt, mode);");
> >>>>
> >>>>  for (p = format, i = j = 0; *p ; ++p, ++i)
> >>>> @@ -266,16 +261,56 @@ gendef (const char *format)
> >>>>    else
> >>>>      printf ("  %s (rt, %d) = arg%d;\n", accessor_from_format (*p), i, j++);
> >>>>
> >>>> -  puts ("\n  return rt;\n}\n");
> >>>> +  puts ("}\n");
> >>>> +
> >>>> +  /* Write the definition of the gen function name and the types
> >>>> +     of the arguments.  */
> >>>> +
> >>>> +  puts ("static inline rtx");
> >>>> +  printf ("gen_rtx_fmt_%s_stat (RTX_CODE code, machine_mode mode", format);
> >>>> +  for (p = format, i = 0; *p != 0; p++)
> >>>> +    if (*p != '0')
> >>>> +      printf (",\n\t%sarg%d", type_from_format (*p), i++);
> >>>> +  puts (" MEM_STAT_DECL)");
> >>>> +
> >>>> +  /* Now write out the body of the function itself, which allocates
> >>>> +     the memory and initializes it.  */
> >>>> +  puts ("{");
> >>>> +  puts ("  rtx rt;\n");
> >>>> +
> >>>> +  puts ("  rt = rtx_alloc (code PASS_MEM_STAT);");
> >>>> +  printf ("  init_rtx_fmt_%s (rt, mode", format);
> >>>> +  for (p = format, i = 0; *p != 0; p++)
> >>>> +    if (*p != '0')
> >>>> +      printf (", arg%d", i++);
> >>>> +  puts (");\n");
> >>>> +
> >>>> +  puts ("  return rt;\n}\n");
> >>>> +
> >>>> +  /* Write the definition of gen macro.  */
> >>>> +
> >>>>  printf ("#define gen_rtx_fmt_%s(c, m", format);
> >>>>  for (p = format, i = 0; *p != 0; p++)
> >>>>    if (*p != '0')
> >>>> -      printf (", p%i",i++);
> >>>> -  printf (")\\\n        gen_rtx_fmt_%s_stat (c, m", format);
> >>>> +      printf (", arg%d", i++);
> >>>> +  printf (") \\\n  gen_rtx_fmt_%s_stat ((c), (m)", format);
> >>>>  for (p = format, i = 0; *p != 0; p++)
> >>>>    if (*p != '0')
> >>>> -      printf (", p%i",i++);
> >>>> +      printf (", (arg%d)", i++);
> >>>>  printf (" MEM_STAT_INFO)\n\n");
> >>>> +
> >>>> +  /* Write the definition of alloca macro.  */
> >>>> +
> >>>> +  printf ("#define alloca_rtx_fmt_%s(rt, c, m", format);
> >>>> +  for (p = format, i = 0; *p != 0; p++)
> >>>> +    if (*p != '0')
> >>>> +      printf (", arg%d", i++);
> >>>> +  printf (") \\\n  rtx_alloca ((rt), (c)); \\\n");
> >>>> +  printf ("  init_rtx_fmt_%s ((rt), (m)", format);
> >>>> +  for (p = format, i = 0; *p != 0; p++)
> >>>> +    if (*p != '0')
> >>>> +      printf (", (arg%d)", i++);
> >>>> +  printf (")\n\n");
> >>>> }
> >>>>
> >>>> /* Generate the documentation header for files we write.  */
> >>>> diff --git a/gcc/rtl.h b/gcc/rtl.h
> >>>> index efb9b3ce40d..44733d8a39e 100644
> >>>> --- a/gcc/rtl.h
> >>>> +++ b/gcc/rtl.h
> >>>> @@ -2933,6 +2933,10 @@ extern HOST_WIDE_INT get_stack_check_protect (void);
> >>>>
> >>>> /* In rtl.c */
> >>>> extern rtx rtx_alloc (RTX_CODE CXX_MEM_STAT_INFO);
> >>>> +#define rtx_alloca(rt, code) \
> >>>> +  (rt) = (rtx) alloca (RTX_CODE_SIZE ((code))); \
> >>>> +  memset ((rt), 0, RTX_HDR_SIZE); \
> >>>> +  PUT_CODE ((rt), (code));
> >>>> extern rtx rtx_alloc_stat_v (RTX_CODE MEM_STAT_DECL, int);
> >>>> #define rtx_alloc_v(c, SZ) rtx_alloc_stat_v (c MEM_STAT_INFO, SZ)
> >>>> #define const_wide_int_alloc(NWORDS)                           \
> >>>> @@ -3797,7 +3801,11 @@ gen_rtx_INSN (machine_mode mode, rtx_insn *prev_insn, rtx_insn *next_insn,
> >>>> extern rtx gen_rtx_CONST_INT (machine_mode, HOST_WIDE_INT);
> >>>> extern rtx gen_rtx_CONST_VECTOR (machine_mode, rtvec);
> >>>> extern void set_mode_and_regno (rtx, machine_mode, unsigned int);
> >>>> +extern void init_raw_REG (rtx, machine_mode, unsigned int);
> >>>> extern rtx gen_raw_REG (machine_mode, unsigned int);
> >>>> +#define alloca_raw_REG(rt, mode, regno) \
> >>>> +  rtx_alloca ((rt), REG); \
> >>>> +  init_raw_REG ((rt), (mode), (regno))
> >>>> extern rtx gen_rtx_REG (machine_mode, unsigned int);
> >>>> extern rtx gen_rtx_SUBREG (machine_mode, rtx, poly_uint64);
> >>>> extern rtx gen_rtx_MEM (machine_mode, rtx);
> >>>>
> >>>> which now allows me to write:
> >>>>
> >>>> rtx reg1, reg2, test;
> >>>> alloca_raw_REG (reg1, cmp_op_mode, LAST_VIRTUAL_REGISTER + 1);
> >>>> alloca_raw_REG (reg2, cmp_op_mode, LAST_VIRTUAL_REGISTER + 2);
> >>>> alloca_rtx_fmt_ee (test, code, value_mode, reg1, reg2);
> >>>>
> >>>> If that looks ok, I'll resend the series.
> >>>
> >>> that looks OK to me - please leave Richard S. time to comment.  Also while
> >>> I'd like to see
> >>>
> >>> rtx reg1 = alloca_raw_REG (cmp_op_mode, LAST_VIRTUAL_REGISTER + 1);
> >>>
> >>> I don't really see a way to write that portably (or at all), do you all agree?
> >>> GCC doesn't seem to convert alloca() calls to __builtin_stack_save/restore
> >>> nor place CLOBBERs to end their lifetime.  But is it guaranteed that the
> >>> alloca result is valid until frame termination?
> >>
> >> Hmm, the alloca man page says:
> >>
> >>       The alloca() function allocates size bytes of space in the stack
> >>       frame of the caller.  This temporary space is automatically freed
> >>       when the function that called alloca() returns to its caller.
> >> ...
> >>       The space allocated by alloca() is not automatically deallocated if
> >>       the pointer that refers to it simply goes out of scope.
> >>
> >> A quick experiment with gcc and clang confirms this.  I think this means
> >> I can make alloca_raw_REG macro return the allocated pointer using the
> >> return-from-block GNU extension.
> >
> > What do you think about the following approach?
> >
> > extern rtx rtx_init (rtx, RTX_CODE);
> > #define rtx_alloca(code) \
> >   rtx_init ((rtx) alloca (RTX_CODE_SIZE ((code))), (code))
> >
> > ...
> >
> > rtx
> > rtx_init (rtx rt, RTX_CODE code)
> > {
> >   /* We want to clear everything up to the FLD array.  Normally, this
> >      is one int, but we don't want to assume that and it isn't very
> >      portable anyway; this is.  */
> >   memset (rt, 0, RTX_HDR_SIZE);
>
> Might as well remove this comment while you're there.  RTX_HDR_SIZE
> has long ceased to be sizeof (int) for any host.
>
> >   PUT_CODE (rt, code);
> >   return rt;
> > }
> >
> > with similar changes to alloca_raw_REG and gengenrtl.  This way we don't
> > even need a GNU extension.
>
> Sounds good to me, but rtx_init should probably be an inline function.

Sounds good to me, too.

Richard.

> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
  2019-08-22 13:47 [PATCH v2 0/9] S/390: Use signaling FP comparison instructions Ilya Leoshkevich
                   ` (8 preceding siblings ...)
  2019-08-22 14:26 ` [PATCH v2 9/9] S/390: Test " Ilya Leoshkevich
@ 2019-08-29 16:08 ` Ilya Leoshkevich
  2019-08-30  7:27   ` Richard Biener
  9 siblings, 1 reply; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-08-29 16:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, Segher Boessenkool, Richard Biener

> Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
> 
> Bootstrap and regtest running on x86_64-redhat-linux and
> s390x-redhat-linux.
> 
> This patch series adds signaling FP comparison support (both scalar and
> vector) to s390 backend.

I'm running into a problem on ppc64 with this patch, and it would be
great if someone could help me figure out the best way to resolve it.

vector36.C test is failing because gimplifier produces the following

  _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 };
  _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;

from

  VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) ,
                  { -1, -1, -1, -1 } ,
                  { 0, 0, 0, 0 } >

Since the comparison tree code is now hidden behind a temporary, my code
does not have anything to pass to the backend.  The reason for creating
a temporary is that the comparison can trap, and so the following check
in gimplify_expr fails:

  if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p))
    goto out;

gimple_test_f is is_gimple_condexpr, and it eventually calls
operation_could_trap_p (GT).

My current solution is to simply state that backend does not support
SSA_NAME in vector comparisons, however, I don't like it, since it may
cause performance regressions due to having to fall back to scalar
comparisons.

I was thinking about two other possible solutions:

1. Change the gimplifier to allow trapping vector comparisons.  That's
   a bit complicated, because tree_could_throw_p checks not only for
   floating point traps, but also e.g. for array index out of bounds
   traps.  So I would have to create a tree_could_throw_p version which
   disregards specific kinds of traps.

2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use
   its tree_code instead of SSA_NAME.  The potential problem I see with
   this is that there appears to be no guarantee that _5 will be inlined
   into _6 at a later point.  So if we say that we don't need to fall
   back to scalar comparisons based on availability of vector >
   instruction and inlining does not happen, then what's actually will
   be required is vector selection (vsel on S/390), which might not be
   available in general case.

What would be a better way to proceed here?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
  2019-08-29 16:08 ` [PATCH v2 0/9] S/390: Use " Ilya Leoshkevich
@ 2019-08-30  7:27   ` Richard Biener
  2019-08-30  7:28     ` Richard Biener
                       ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Richard Biener @ 2019-08-30  7:27 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: GCC Patches, Richard Sandiford, Segher Boessenkool

On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> > Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
> >
> > Bootstrap and regtest running on x86_64-redhat-linux and
> > s390x-redhat-linux.
> >
> > This patch series adds signaling FP comparison support (both scalar and
> > vector) to s390 backend.
>
> I'm running into a problem on ppc64 with this patch, and it would be
> great if someone could help me figure out the best way to resolve it.
>
> vector36.C test is failing because gimplifier produces the following
>
>   _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 };
>   _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
>
> from
>
>   VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) ,
>                   { -1, -1, -1, -1 } ,
>                   { 0, 0, 0, 0 } >
>
> Since the comparison tree code is now hidden behind a temporary, my code
> does not have anything to pass to the backend.  The reason for creating
> a temporary is that the comparison can trap, and so the following check
> in gimplify_expr fails:
>
>   if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p))
>     goto out;
>
> gimple_test_f is is_gimple_condexpr, and it eventually calls
> operation_could_trap_p (GT).
>
> My current solution is to simply state that backend does not support
> SSA_NAME in vector comparisons, however, I don't like it, since it may
> cause performance regressions due to having to fall back to scalar
> comparisons.
>
> I was thinking about two other possible solutions:
>
> 1. Change the gimplifier to allow trapping vector comparisons.  That's
>    a bit complicated, because tree_could_throw_p checks not only for
>    floating point traps, but also e.g. for array index out of bounds
>    traps.  So I would have to create a tree_could_throw_p version which
>    disregards specific kinds of traps.
>
> 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use
>    its tree_code instead of SSA_NAME.  The potential problem I see with
>    this is that there appears to be no guarantee that _5 will be inlined
>    into _6 at a later point.  So if we say that we don't need to fall
>    back to scalar comparisons based on availability of vector >
>    instruction and inlining does not happen, then what's actually will
>    be required is vector selection (vsel on S/390), which might not be
>    available in general case.
>
> What would be a better way to proceed here?

On GIMPLE there isn't a good reason to split out trapping comparisons
from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs
where it is important because we'd have no way to represent EH info
when not done.  It might be a bit awkward to preserve EH across RTL
expansion though in case the [VEC_]COND_EXPR are not expanded
as a single pattern, but I'm not sure.

To go this route you'd have to split the is_gimple_condexpr check
I guess and eventually users turning [VEC_]COND_EXPR into conditional
code (do we have any?) have to be extra careful then.

Richard.

>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
  2019-08-30  7:27   ` Richard Biener
@ 2019-08-30  7:28     ` Richard Biener
  2019-08-30 14:32       ` Ilya Leoshkevich
  2019-08-30 10:14     ` Segher Boessenkool
  2019-08-30 15:05     ` Ilya Leoshkevich
  2 siblings, 1 reply; 35+ messages in thread
From: Richard Biener @ 2019-08-30  7:28 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: GCC Patches, Richard Sandiford, Segher Boessenkool

On Fri, Aug 30, 2019 at 9:12 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >
> > > Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
> > >
> > > Bootstrap and regtest running on x86_64-redhat-linux and
> > > s390x-redhat-linux.
> > >
> > > This patch series adds signaling FP comparison support (both scalar and
> > > vector) to s390 backend.
> >
> > I'm running into a problem on ppc64 with this patch, and it would be
> > great if someone could help me figure out the best way to resolve it.
> >
> > vector36.C test is failing because gimplifier produces the following
> >
> >   _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 };
> >   _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
> >
> > from
> >
> >   VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) ,
> >                   { -1, -1, -1, -1 } ,
> >                   { 0, 0, 0, 0 } >
> >
> > Since the comparison tree code is now hidden behind a temporary, my code
> > does not have anything to pass to the backend.  The reason for creating
> > a temporary is that the comparison can trap, and so the following check
> > in gimplify_expr fails:
> >
> >   if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p))
> >     goto out;
> >
> > gimple_test_f is is_gimple_condexpr, and it eventually calls
> > operation_could_trap_p (GT).
> >
> > My current solution is to simply state that backend does not support
> > SSA_NAME in vector comparisons, however, I don't like it, since it may
> > cause performance regressions due to having to fall back to scalar
> > comparisons.
> >
> > I was thinking about two other possible solutions:
> >
> > 1. Change the gimplifier to allow trapping vector comparisons.  That's
> >    a bit complicated, because tree_could_throw_p checks not only for
> >    floating point traps, but also e.g. for array index out of bounds
> >    traps.  So I would have to create a tree_could_throw_p version which
> >    disregards specific kinds of traps.
> >
> > 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use
> >    its tree_code instead of SSA_NAME.  The potential problem I see with
> >    this is that there appears to be no guarantee that _5 will be inlined
> >    into _6 at a later point.  So if we say that we don't need to fall
> >    back to scalar comparisons based on availability of vector >
> >    instruction and inlining does not happen, then what's actually will
> >    be required is vector selection (vsel on S/390), which might not be
> >    available in general case.
> >
> > What would be a better way to proceed here?
>
> On GIMPLE there isn't a good reason to split out trapping comparisons
> from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs
> where it is important because we'd have no way to represent EH info
> when not done.  It might be a bit awkward to preserve EH across RTL
> expansion though in case the [VEC_]COND_EXPR are not expanded
> as a single pattern, but I'm not sure.
>
> To go this route you'd have to split the is_gimple_condexpr check
> I guess and eventually users turning [VEC_]COND_EXPR into conditional
> code (do we have any?) have to be extra careful then.

Oh, btw - the fact that we have an expression embedded in [VEC_]COND_EXPR
is something that bothers me for quite some time already and it makes
things like VN awkward and GIMPLE fincky.  We've discussed alternatives
to dead with the simplest being moving the comparison out to a separate
stmt and others like having four operand [VEC_]COND_{EQ,NE,...}_EXPR
codes or simply treating {EQ,NE,...}_EXPR as quarternary on GIMPLE
with either optional 3rd and 4th operand (defaulting to boolean_true/false_node)
or always explicit ones (and thus dropping [VEC_]COND_EXPR).

What does LLVM do here?

Richard.

> Richard.
>
> >

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
  2019-08-30  7:27   ` Richard Biener
  2019-08-30  7:28     ` Richard Biener
@ 2019-08-30 10:14     ` Segher Boessenkool
  2019-08-30 11:49       ` Richard Biener
  2019-08-30 15:05     ` Ilya Leoshkevich
  2 siblings, 1 reply; 35+ messages in thread
From: Segher Boessenkool @ 2019-08-30 10:14 UTC (permalink / raw)
  To: Richard Biener; +Cc: Ilya Leoshkevich, GCC Patches, Richard Sandiford

On Fri, Aug 30, 2019 at 09:12:11AM +0200, Richard Biener wrote:
> On GIMPLE there isn't a good reason to split out trapping comparisons
> from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs
> where it is important because we'd have no way to represent EH info
> when not done.  It might be a bit awkward to preserve EH across RTL
> expansion though in case the [VEC_]COND_EXPR are not expanded
> as a single pattern, but I'm not sure.

The *language* specifies which comparisons trap on unordered and which
do not.  (Hopefully it is similar for all or this is going to be a huge
mess :-) )  So Gimple needs to at least keep track of this.  There also
are various optimisations that can be done -- two signaling comparisons
with the same arguments can be folded to just one, for example -- so it
seems to me you want to represent this in Gimple, never mind if you do
EH for them or just a magical trap or whatever.

Segher

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
  2019-08-30 10:14     ` Segher Boessenkool
@ 2019-08-30 11:49       ` Richard Biener
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Biener @ 2019-08-30 11:49 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Ilya Leoshkevich, GCC Patches, Richard Sandiford

On Fri, Aug 30, 2019 at 12:06 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Fri, Aug 30, 2019 at 09:12:11AM +0200, Richard Biener wrote:
> > On GIMPLE there isn't a good reason to split out trapping comparisons
> > from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs
> > where it is important because we'd have no way to represent EH info
> > when not done.  It might be a bit awkward to preserve EH across RTL
> > expansion though in case the [VEC_]COND_EXPR are not expanded
> > as a single pattern, but I'm not sure.
>
> The *language* specifies which comparisons trap on unordered and which
> do not.  (Hopefully it is similar for all or this is going to be a huge
> mess :-) )  So Gimple needs to at least keep track of this.  There also
> are various optimisations that can be done -- two signaling comparisons
> with the same arguments can be folded to just one, for example -- so it
> seems to me you want to represent this in Gimple, never mind if you do
> EH for them or just a magical trap or whatever.

The issue with GIMPLE_CONDs is that we can't have EH edges out
of blocks ending in them so to represent the compare-and-jump
single GIMPLE insn raising exceptions we need to split the actually
trapping compare away.

For [VEC_]COND_EXPR we'd need to say the whole expression
can throw/trap if the embedded comparison can.  On GIMPLE this
is enough precision but we of course have to handle it that way
then.  Currently operation_could_trap* do not consider [VEC_]COND_EXPR
trapping nor does tree_could_trap_p.  stmt_could_throw_1_p looks
fine by means of falling back to checking individual operands and
then running into the embedded comparison.  But we do have
operation_could_trap* callers which would need to be adjusted
on the GIMPLE side.

Richard.

>
> Segher

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
  2019-08-30  7:28     ` Richard Biener
@ 2019-08-30 14:32       ` Ilya Leoshkevich
  0 siblings, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-08-30 14:32 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Richard Sandiford, Segher Boessenkool

> Am 30.08.2019 um 09:16 schrieb Richard Biener <richard.guenther@gmail.com>:
> 
> On Fri, Aug 30, 2019 at 9:12 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
>> 
>> On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>>> 
>>>> Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
>>>> 
>>>> Bootstrap and regtest running on x86_64-redhat-linux and
>>>> s390x-redhat-linux.
>>>> 
>>>> This patch series adds signaling FP comparison support (both scalar and
>>>> vector) to s390 backend.
>>> 
>>> I'm running into a problem on ppc64 with this patch, and it would be
>>> great if someone could help me figure out the best way to resolve it.
>>> 
>>> vector36.C test is failing because gimplifier produces the following
>>> 
>>>  _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 };
>>>  _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
>>> 
>>> from
>>> 
>>>  VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) ,
>>>                  { -1, -1, -1, -1 } ,
>>>                  { 0, 0, 0, 0 } >
>>> 
>>> Since the comparison tree code is now hidden behind a temporary, my code
>>> does not have anything to pass to the backend.  The reason for creating
>>> a temporary is that the comparison can trap, and so the following check
>>> in gimplify_expr fails:
>>> 
>>>  if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p))
>>>    goto out;
>>> 
>>> gimple_test_f is is_gimple_condexpr, and it eventually calls
>>> operation_could_trap_p (GT).
>>> 
>>> My current solution is to simply state that backend does not support
>>> SSA_NAME in vector comparisons, however, I don't like it, since it may
>>> cause performance regressions due to having to fall back to scalar
>>> comparisons.
>>> 
>>> I was thinking about two other possible solutions:
>>> 
>>> 1. Change the gimplifier to allow trapping vector comparisons.  That's
>>>   a bit complicated, because tree_could_throw_p checks not only for
>>>   floating point traps, but also e.g. for array index out of bounds
>>>   traps.  So I would have to create a tree_could_throw_p version which
>>>   disregards specific kinds of traps.
>>> 
>>> 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use
>>>   its tree_code instead of SSA_NAME.  The potential problem I see with
>>>   this is that there appears to be no guarantee that _5 will be inlined
>>>   into _6 at a later point.  So if we say that we don't need to fall
>>>   back to scalar comparisons based on availability of vector >
>>>   instruction and inlining does not happen, then what's actually will
>>>   be required is vector selection (vsel on S/390), which might not be
>>>   available in general case.
>>> 
>>> What would be a better way to proceed here?
>> 
>> On GIMPLE there isn't a good reason to split out trapping comparisons
>> from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs
>> where it is important because we'd have no way to represent EH info
>> when not done.  It might be a bit awkward to preserve EH across RTL
>> expansion though in case the [VEC_]COND_EXPR are not expanded
>> as a single pattern, but I'm not sure.
>> 
>> To go this route you'd have to split the is_gimple_condexpr check
>> I guess and eventually users turning [VEC_]COND_EXPR into conditional
>> code (do we have any?) have to be extra careful then.
> 
> Oh, btw - the fact that we have an expression embedded in [VEC_]COND_EXPR
> is something that bothers me for quite some time already and it makes
> things like VN awkward and GIMPLE fincky.  We've discussed alternatives
> to dead with the simplest being moving the comparison out to a separate
> stmt and others like having four operand [VEC_]COND_{EQ,NE,...}_EXPR
> codes or simply treating {EQ,NE,...}_EXPR as quarternary on GIMPLE
> with either optional 3rd and 4th operand (defaulting to boolean_true/false_node)
> or always explicit ones (and thus dropping [VEC_]COND_EXPR).
> 
> What does LLVM do here?

For

void f(long long * restrict w, double * restrict x, double * restrict y, int n)
{
        for (int i = 0; i < n; i++)
                w[i] = x[i] == y[i] ? x[i] : y[i];
}

LLVM does

  %26 = fcmp oeq <2 x double> %21, %25
  %27 = extractelement <2 x i1> %26, i32 0
  %28 = select <2 x i1> %26, <2 x double> %21, <2 x double> %25

So they have separate operations for comparisons and ternary operator
(fcmp + select).

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
  2019-08-30  7:27   ` Richard Biener
  2019-08-30  7:28     ` Richard Biener
  2019-08-30 10:14     ` Segher Boessenkool
@ 2019-08-30 15:05     ` Ilya Leoshkevich
  2019-08-30 16:09       ` Ilya Leoshkevich
  2 siblings, 1 reply; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-08-30 15:05 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Richard Sandiford, Segher Boessenkool

> Am 30.08.2019 um 09:12 schrieb Richard Biener <richard.guenther@gmail.com>:
> 
> On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>> 
>>> Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
>>> 
>>> Bootstrap and regtest running on x86_64-redhat-linux and
>>> s390x-redhat-linux.
>>> 
>>> This patch series adds signaling FP comparison support (both scalar and
>>> vector) to s390 backend.
>> 
>> I'm running into a problem on ppc64 with this patch, and it would be
>> great if someone could help me figure out the best way to resolve it.
>> 
>> vector36.C test is failing because gimplifier produces the following
>> 
>>  _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 };
>>  _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
>> 
>> from
>> 
>>  VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) ,
>>                  { -1, -1, -1, -1 } ,
>>                  { 0, 0, 0, 0 } >
>> 
>> Since the comparison tree code is now hidden behind a temporary, my code
>> does not have anything to pass to the backend.  The reason for creating
>> a temporary is that the comparison can trap, and so the following check
>> in gimplify_expr fails:
>> 
>>  if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p))
>>    goto out;
>> 
>> gimple_test_f is is_gimple_condexpr, and it eventually calls
>> operation_could_trap_p (GT).
>> 
>> My current solution is to simply state that backend does not support
>> SSA_NAME in vector comparisons, however, I don't like it, since it may
>> cause performance regressions due to having to fall back to scalar
>> comparisons.
>> 
>> I was thinking about two other possible solutions:
>> 
>> 1. Change the gimplifier to allow trapping vector comparisons.  That's
>>   a bit complicated, because tree_could_throw_p checks not only for
>>   floating point traps, but also e.g. for array index out of bounds
>>   traps.  So I would have to create a tree_could_throw_p version which
>>   disregards specific kinds of traps.
>> 
>> 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use
>>   its tree_code instead of SSA_NAME.  The potential problem I see with
>>   this is that there appears to be no guarantee that _5 will be inlined
>>   into _6 at a later point.  So if we say that we don't need to fall
>>   back to scalar comparisons based on availability of vector >
>>   instruction and inlining does not happen, then what's actually will
>>   be required is vector selection (vsel on S/390), which might not be
>>   available in general case.
>> 
>> What would be a better way to proceed here?
> 
> On GIMPLE there isn't a good reason to split out trapping comparisons
> from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs
> where it is important because we'd have no way to represent EH info
> when not done.  It might be a bit awkward to preserve EH across RTL
> expansion though in case the [VEC_]COND_EXPR are not expanded
> as a single pattern, but I'm not sure.

Ok, so I'm testing the following now - for the problematic test that
helped:

diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
index b0c9f9b671a..940aa394769 100644
--- a/gcc/gimple-expr.c
+++ b/gcc/gimple-expr.c
@@ -602,17 +602,33 @@ is_gimple_lvalue (tree t)
 	  || TREE_CODE (t) == BIT_FIELD_REF);
 }
 
-/*  Return true if T is a GIMPLE condition.  */
+/* Helper for is_gimple_condexpr and is_possibly_trapping_gimple_condexpr.  */
 
-bool
-is_gimple_condexpr (tree t)
+static bool
+is_gimple_condexpr_1 (tree t, bool allow_traps)
 {
   return (is_gimple_val (t) || (COMPARISON_CLASS_P (t)
-				&& !tree_could_throw_p (t)
+				&& (allow_traps || !tree_could_throw_p (t))
 				&& is_gimple_val (TREE_OPERAND (t, 0))
 				&& is_gimple_val (TREE_OPERAND (t, 1))));
 }
 
+/*  Return true if T is a GIMPLE condition.  */
+
+bool
+is_gimple_condexpr (tree t)
+{
+  return is_gimple_condexpr_1 (t, false);
+}
+
+/* Like is_gimple_condexpr, but allow the T to trap.  */
+
+bool
+is_possibly_trapping_gimple_condexpr (tree t)
+{
+  return is_gimple_condexpr_1 (t, true);
+}
+
 /* Return true if T is a gimple address.  */
 
 bool
diff --git a/gcc/gimple-expr.h b/gcc/gimple-expr.h
index 1ad1432bd17..20546ca5b99 100644
--- a/gcc/gimple-expr.h
+++ b/gcc/gimple-expr.h
@@ -41,6 +41,7 @@ extern void gimple_cond_get_ops_from_tree (tree, enum tree_code *, tree *,
 					   tree *);
 extern bool is_gimple_lvalue (tree);
 extern bool is_gimple_condexpr (tree);
+extern bool is_possibly_trapping_gimple_condexpr (tree);
 extern bool is_gimple_address (const_tree);
 extern bool is_gimple_invariant_address (const_tree);
 extern bool is_gimple_ip_invariant_address (const_tree);
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index daa0b71c191..4e6256390c0 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -12973,6 +12973,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
   else if (gimple_test_f == is_gimple_val
            || gimple_test_f == is_gimple_call_addr
            || gimple_test_f == is_gimple_condexpr
+	   || gimple_test_f == is_possibly_trapping_gimple_condexpr
            || gimple_test_f == is_gimple_mem_rhs
            || gimple_test_f == is_gimple_mem_rhs_or_call
            || gimple_test_f == is_gimple_reg_rhs
@@ -13814,7 +13815,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	    enum gimplify_status r0, r1, r2;
 
 	    r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
-				post_p, is_gimple_condexpr, fb_rvalue);
+				post_p, is_possibly_trapping_gimple_condexpr, fb_rvalue);
 	    r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
 				post_p, is_gimple_val, fb_rvalue);
 	    r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p,
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index b75fdb2e63f..175b858f56b 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4121,8 +4121,11 @@ verify_gimple_assign_ternary (gassign *stmt)
       return true;
     }
 
-  if (((rhs_code == VEC_COND_EXPR || rhs_code == COND_EXPR)
-       ? !is_gimple_condexpr (rhs1) : !is_gimple_val (rhs1))
+  if ((rhs_code == VEC_COND_EXPR
+       ? !is_possibly_trapping_gimple_condexpr (rhs1)
+       : (rhs_code == COND_EXPR
+	  ? !is_gimple_condexpr (rhs1)
+	  : !is_gimple_val (rhs1)))
       || !is_gimple_val (rhs2)
       || !is_gimple_val (rhs3))
     {

> 
> To go this route you'd have to split the is_gimple_condexpr check
> I guess and eventually users turning [VEC_]COND_EXPR into conditional
> code (do we have any?) have to be extra careful then.
> 

We have expand_vector_condition, which turns VEC_COND_EXPR into
COND_EXPR - but this should be harmless, right?  I could not find
anything else.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
  2019-08-30 15:05     ` Ilya Leoshkevich
@ 2019-08-30 16:09       ` Ilya Leoshkevich
  2019-09-02 10:37         ` Richard Biener
  0 siblings, 1 reply; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-08-30 16:09 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Richard Sandiford, Segher Boessenkool

> Am 30.08.2019 um 16:40 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
> 
>> Am 30.08.2019 um 09:12 schrieb Richard Biener <richard.guenther@gmail.com>:
>> 
>> On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>>> 
>>>> Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
>>>> 
>>>> Bootstrap and regtest running on x86_64-redhat-linux and
>>>> s390x-redhat-linux.
>>>> 
>>>> This patch series adds signaling FP comparison support (both scalar and
>>>> vector) to s390 backend.
>>> 
>>> I'm running into a problem on ppc64 with this patch, and it would be
>>> great if someone could help me figure out the best way to resolve it.
>>> 
>>> vector36.C test is failing because gimplifier produces the following
>>> 
>>> _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 };
>>> _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
>>> 
>>> from
>>> 
>>> VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) ,
>>>                 { -1, -1, -1, -1 } ,
>>>                 { 0, 0, 0, 0 } >
>>> 
>>> Since the comparison tree code is now hidden behind a temporary, my code
>>> does not have anything to pass to the backend.  The reason for creating
>>> a temporary is that the comparison can trap, and so the following check
>>> in gimplify_expr fails:
>>> 
>>> if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p))
>>>   goto out;
>>> 
>>> gimple_test_f is is_gimple_condexpr, and it eventually calls
>>> operation_could_trap_p (GT).
>>> 
>>> My current solution is to simply state that backend does not support
>>> SSA_NAME in vector comparisons, however, I don't like it, since it may
>>> cause performance regressions due to having to fall back to scalar
>>> comparisons.
>>> 
>>> I was thinking about two other possible solutions:
>>> 
>>> 1. Change the gimplifier to allow trapping vector comparisons.  That's
>>>  a bit complicated, because tree_could_throw_p checks not only for
>>>  floating point traps, but also e.g. for array index out of bounds
>>>  traps.  So I would have to create a tree_could_throw_p version which
>>>  disregards specific kinds of traps.
>>> 
>>> 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use
>>>  its tree_code instead of SSA_NAME.  The potential problem I see with
>>>  this is that there appears to be no guarantee that _5 will be inlined
>>>  into _6 at a later point.  So if we say that we don't need to fall
>>>  back to scalar comparisons based on availability of vector >
>>>  instruction and inlining does not happen, then what's actually will
>>>  be required is vector selection (vsel on S/390), which might not be
>>>  available in general case.
>>> 
>>> What would be a better way to proceed here?
>> 
>> On GIMPLE there isn't a good reason to split out trapping comparisons
>> from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs
>> where it is important because we'd have no way to represent EH info
>> when not done.  It might be a bit awkward to preserve EH across RTL
>> expansion though in case the [VEC_]COND_EXPR are not expanded
>> as a single pattern, but I'm not sure.
> 
> Ok, so I'm testing the following now - for the problematic test that
> helped:
> 
> diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
> index b0c9f9b671a..940aa394769 100644
> --- a/gcc/gimple-expr.c
> +++ b/gcc/gimple-expr.c
> @@ -602,17 +602,33 @@ is_gimple_lvalue (tree t)
> 	  || TREE_CODE (t) == BIT_FIELD_REF);
> }
> 
> -/*  Return true if T is a GIMPLE condition.  */
> +/* Helper for is_gimple_condexpr and is_possibly_trapping_gimple_condexpr.  */
> 
> -bool
> -is_gimple_condexpr (tree t)
> +static bool
> +is_gimple_condexpr_1 (tree t, bool allow_traps)
> {
>   return (is_gimple_val (t) || (COMPARISON_CLASS_P (t)
> -				&& !tree_could_throw_p (t)
> +				&& (allow_traps || !tree_could_throw_p (t))
> 				&& is_gimple_val (TREE_OPERAND (t, 0))
> 				&& is_gimple_val (TREE_OPERAND (t, 1))));
> }
> 
> +/*  Return true if T is a GIMPLE condition.  */
> +
> +bool
> +is_gimple_condexpr (tree t)
> +{
> +  return is_gimple_condexpr_1 (t, false);
> +}
> +
> +/* Like is_gimple_condexpr, but allow the T to trap.  */
> +
> +bool
> +is_possibly_trapping_gimple_condexpr (tree t)
> +{
> +  return is_gimple_condexpr_1 (t, true);
> +}
> +
> /* Return true if T is a gimple address.  */
> 
> bool
> diff --git a/gcc/gimple-expr.h b/gcc/gimple-expr.h
> index 1ad1432bd17..20546ca5b99 100644
> --- a/gcc/gimple-expr.h
> +++ b/gcc/gimple-expr.h
> @@ -41,6 +41,7 @@ extern void gimple_cond_get_ops_from_tree (tree, enum tree_code *, tree *,
> 					   tree *);
> extern bool is_gimple_lvalue (tree);
> extern bool is_gimple_condexpr (tree);
> +extern bool is_possibly_trapping_gimple_condexpr (tree);
> extern bool is_gimple_address (const_tree);
> extern bool is_gimple_invariant_address (const_tree);
> extern bool is_gimple_ip_invariant_address (const_tree);
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index daa0b71c191..4e6256390c0 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -12973,6 +12973,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
>   else if (gimple_test_f == is_gimple_val
>            || gimple_test_f == is_gimple_call_addr
>            || gimple_test_f == is_gimple_condexpr
> +	   || gimple_test_f == is_possibly_trapping_gimple_condexpr
>            || gimple_test_f == is_gimple_mem_rhs
>            || gimple_test_f == is_gimple_mem_rhs_or_call
>            || gimple_test_f == is_gimple_reg_rhs
> @@ -13814,7 +13815,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
> 	    enum gimplify_status r0, r1, r2;
> 
> 	    r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
> -				post_p, is_gimple_condexpr, fb_rvalue);
> +				post_p, is_possibly_trapping_gimple_condexpr, fb_rvalue);
> 	    r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
> 				post_p, is_gimple_val, fb_rvalue);
> 	    r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p,
> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index b75fdb2e63f..175b858f56b 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -4121,8 +4121,11 @@ verify_gimple_assign_ternary (gassign *stmt)
>       return true;
>     }
> 
> -  if (((rhs_code == VEC_COND_EXPR || rhs_code == COND_EXPR)
> -       ? !is_gimple_condexpr (rhs1) : !is_gimple_val (rhs1))
> +  if ((rhs_code == VEC_COND_EXPR
> +       ? !is_possibly_trapping_gimple_condexpr (rhs1)
> +       : (rhs_code == COND_EXPR
> +	  ? !is_gimple_condexpr (rhs1)
> +	  : !is_gimple_val (rhs1)))
>       || !is_gimple_val (rhs2)
>       || !is_gimple_val (rhs3))
>     {
> 
>> 
>> To go this route you'd have to split the is_gimple_condexpr check
>> I guess and eventually users turning [VEC_]COND_EXPR into conditional
>> code (do we have any?) have to be extra careful then.
>> 
> 
> We have expand_vector_condition, which turns VEC_COND_EXPR into
> COND_EXPR - but this should be harmless, right?  I could not find
> anything else.

Ugh, I've realized I need to check not only VEC_COND_EXPR, but also
COND_EXPR usages.  There is, of course, a great deal more code, so I'm
not sure whether I looked exhaustively through it, but there are at
least store_expr and do_jump which do exactly this during expansion.
Should we worry about EH edges at this point?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
  2019-08-30 16:09       ` Ilya Leoshkevich
@ 2019-09-02 10:37         ` Richard Biener
  2019-09-02 16:28           ` Ilya Leoshkevich
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Biener @ 2019-09-02 10:37 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: GCC Patches, Richard Sandiford, Segher Boessenkool

On Fri, Aug 30, 2019 at 5:25 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> > Am 30.08.2019 um 16:40 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
> >
> >> Am 30.08.2019 um 09:12 schrieb Richard Biener <richard.guenther@gmail.com>:
> >>
> >> On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >>>
> >>>> Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
> >>>>
> >>>> Bootstrap and regtest running on x86_64-redhat-linux and
> >>>> s390x-redhat-linux.
> >>>>
> >>>> This patch series adds signaling FP comparison support (both scalar and
> >>>> vector) to s390 backend.
> >>>
> >>> I'm running into a problem on ppc64 with this patch, and it would be
> >>> great if someone could help me figure out the best way to resolve it.
> >>>
> >>> vector36.C test is failing because gimplifier produces the following
> >>>
> >>> _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 };
> >>> _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
> >>>
> >>> from
> >>>
> >>> VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) ,
> >>>                 { -1, -1, -1, -1 } ,
> >>>                 { 0, 0, 0, 0 } >
> >>>
> >>> Since the comparison tree code is now hidden behind a temporary, my code
> >>> does not have anything to pass to the backend.  The reason for creating
> >>> a temporary is that the comparison can trap, and so the following check
> >>> in gimplify_expr fails:
> >>>
> >>> if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p))
> >>>   goto out;
> >>>
> >>> gimple_test_f is is_gimple_condexpr, and it eventually calls
> >>> operation_could_trap_p (GT).
> >>>
> >>> My current solution is to simply state that backend does not support
> >>> SSA_NAME in vector comparisons, however, I don't like it, since it may
> >>> cause performance regressions due to having to fall back to scalar
> >>> comparisons.
> >>>
> >>> I was thinking about two other possible solutions:
> >>>
> >>> 1. Change the gimplifier to allow trapping vector comparisons.  That's
> >>>  a bit complicated, because tree_could_throw_p checks not only for
> >>>  floating point traps, but also e.g. for array index out of bounds
> >>>  traps.  So I would have to create a tree_could_throw_p version which
> >>>  disregards specific kinds of traps.
> >>>
> >>> 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use
> >>>  its tree_code instead of SSA_NAME.  The potential problem I see with
> >>>  this is that there appears to be no guarantee that _5 will be inlined
> >>>  into _6 at a later point.  So if we say that we don't need to fall
> >>>  back to scalar comparisons based on availability of vector >
> >>>  instruction and inlining does not happen, then what's actually will
> >>>  be required is vector selection (vsel on S/390), which might not be
> >>>  available in general case.
> >>>
> >>> What would be a better way to proceed here?
> >>
> >> On GIMPLE there isn't a good reason to split out trapping comparisons
> >> from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs
> >> where it is important because we'd have no way to represent EH info
> >> when not done.  It might be a bit awkward to preserve EH across RTL
> >> expansion though in case the [VEC_]COND_EXPR are not expanded
> >> as a single pattern, but I'm not sure.
> >
> > Ok, so I'm testing the following now - for the problematic test that
> > helped:
> >
> > diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
> > index b0c9f9b671a..940aa394769 100644
> > --- a/gcc/gimple-expr.c
> > +++ b/gcc/gimple-expr.c
> > @@ -602,17 +602,33 @@ is_gimple_lvalue (tree t)
> >         || TREE_CODE (t) == BIT_FIELD_REF);
> > }
> >
> > -/*  Return true if T is a GIMPLE condition.  */
> > +/* Helper for is_gimple_condexpr and is_possibly_trapping_gimple_condexpr.  */
> >
> > -bool
> > -is_gimple_condexpr (tree t)
> > +static bool
> > +is_gimple_condexpr_1 (tree t, bool allow_traps)
> > {
> >   return (is_gimple_val (t) || (COMPARISON_CLASS_P (t)
> > -                             && !tree_could_throw_p (t)
> > +                             && (allow_traps || !tree_could_throw_p (t))
> >                               && is_gimple_val (TREE_OPERAND (t, 0))
> >                               && is_gimple_val (TREE_OPERAND (t, 1))));
> > }
> >
> > +/*  Return true if T is a GIMPLE condition.  */
> > +
> > +bool
> > +is_gimple_condexpr (tree t)
> > +{
> > +  return is_gimple_condexpr_1 (t, false);
> > +}
> > +
> > +/* Like is_gimple_condexpr, but allow the T to trap.  */
> > +
> > +bool
> > +is_possibly_trapping_gimple_condexpr (tree t)
> > +{
> > +  return is_gimple_condexpr_1 (t, true);
> > +}
> > +
> > /* Return true if T is a gimple address.  */
> >
> > bool
> > diff --git a/gcc/gimple-expr.h b/gcc/gimple-expr.h
> > index 1ad1432bd17..20546ca5b99 100644
> > --- a/gcc/gimple-expr.h
> > +++ b/gcc/gimple-expr.h
> > @@ -41,6 +41,7 @@ extern void gimple_cond_get_ops_from_tree (tree, enum tree_code *, tree *,
> >                                          tree *);
> > extern bool is_gimple_lvalue (tree);
> > extern bool is_gimple_condexpr (tree);
> > +extern bool is_possibly_trapping_gimple_condexpr (tree);
> > extern bool is_gimple_address (const_tree);
> > extern bool is_gimple_invariant_address (const_tree);
> > extern bool is_gimple_ip_invariant_address (const_tree);
> > diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> > index daa0b71c191..4e6256390c0 100644
> > --- a/gcc/gimplify.c
> > +++ b/gcc/gimplify.c
> > @@ -12973,6 +12973,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
> >   else if (gimple_test_f == is_gimple_val
> >            || gimple_test_f == is_gimple_call_addr
> >            || gimple_test_f == is_gimple_condexpr
> > +        || gimple_test_f == is_possibly_trapping_gimple_condexpr
> >            || gimple_test_f == is_gimple_mem_rhs
> >            || gimple_test_f == is_gimple_mem_rhs_or_call
> >            || gimple_test_f == is_gimple_reg_rhs
> > @@ -13814,7 +13815,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
> >           enum gimplify_status r0, r1, r2;
> >
> >           r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
> > -                             post_p, is_gimple_condexpr, fb_rvalue);
> > +                             post_p, is_possibly_trapping_gimple_condexpr, fb_rvalue);
> >           r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
> >                               post_p, is_gimple_val, fb_rvalue);
> >           r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p,
> > diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> > index b75fdb2e63f..175b858f56b 100644
> > --- a/gcc/tree-cfg.c
> > +++ b/gcc/tree-cfg.c
> > @@ -4121,8 +4121,11 @@ verify_gimple_assign_ternary (gassign *stmt)
> >       return true;
> >     }
> >
> > -  if (((rhs_code == VEC_COND_EXPR || rhs_code == COND_EXPR)
> > -       ? !is_gimple_condexpr (rhs1) : !is_gimple_val (rhs1))
> > +  if ((rhs_code == VEC_COND_EXPR
> > +       ? !is_possibly_trapping_gimple_condexpr (rhs1)
> > +       : (rhs_code == COND_EXPR
> > +       ? !is_gimple_condexpr (rhs1)
> > +       : !is_gimple_val (rhs1)))
> >       || !is_gimple_val (rhs2)
> >       || !is_gimple_val (rhs3))
> >     {
> >
> >>
> >> To go this route you'd have to split the is_gimple_condexpr check
> >> I guess and eventually users turning [VEC_]COND_EXPR into conditional
> >> code (do we have any?) have to be extra careful then.
> >>
> >
> > We have expand_vector_condition, which turns VEC_COND_EXPR into
> > COND_EXPR - but this should be harmless, right?  I could not find
> > anything else.
>
> Ugh, I've realized I need to check not only VEC_COND_EXPR, but also
> COND_EXPR usages.  There is, of course, a great deal more code, so I'm
> not sure whether I looked exhaustively through it, but there are at
> least store_expr and do_jump which do exactly this during expansion.
> Should we worry about EH edges at this point?

Well, the EH edge needs to persist (and be rooted off the comparison,
not the selection).

That said, I'd simply stop using is_gimple_condexpr for GIMPLE_CONDs
(it allows is_gimple_val which isn't proper form for GIMPLE_COND).  Of course
there's code using it for GIMPLE_CONDs which would need to be adjusted.

Richard.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
  2019-09-02 10:37         ` Richard Biener
@ 2019-09-02 16:28           ` Ilya Leoshkevich
  2019-09-03 10:08             ` Richard Biener
  0 siblings, 1 reply; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-09-02 16:28 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Richard Sandiford, Segher Boessenkool

> Am 02.09.2019 um 12:37 schrieb Richard Biener <richard.guenther@gmail.com>:
> 
> On Fri, Aug 30, 2019 at 5:25 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>> 
>>> Am 30.08.2019 um 16:40 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
>>> 
>>>> Am 30.08.2019 um 09:12 schrieb Richard Biener <richard.guenther@gmail.com>:
>>>> 
>>>> On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>>>>> 
>>>>>> Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
>>>>>> 
>>>>>> Bootstrap and regtest running on x86_64-redhat-linux and
>>>>>> s390x-redhat-linux.
>>>>>> 
>>>>>> This patch series adds signaling FP comparison support (both scalar and
>>>>>> vector) to s390 backend.
>>>>> 
>>>>> I'm running into a problem on ppc64 with this patch, and it would be
>>>>> great if someone could help me figure out the best way to resolve it.
>>>>> 
>>>>> vector36.C test is failing because gimplifier produces the following
>>>>> 
>>>>> _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 };
>>>>> _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
>>>>> 
>>>>> from
>>>>> 
>>>>> VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) ,
>>>>>                { -1, -1, -1, -1 } ,
>>>>>                { 0, 0, 0, 0 } >
>>>>> 
>>>>> Since the comparison tree code is now hidden behind a temporary, my code
>>>>> does not have anything to pass to the backend.  The reason for creating
>>>>> a temporary is that the comparison can trap, and so the following check
>>>>> in gimplify_expr fails:
>>>>> 
>>>>> if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p))
>>>>>  goto out;
>>>>> 
>>>>> gimple_test_f is is_gimple_condexpr, and it eventually calls
>>>>> operation_could_trap_p (GT).
>>>>> 
>>>>> My current solution is to simply state that backend does not support
>>>>> SSA_NAME in vector comparisons, however, I don't like it, since it may
>>>>> cause performance regressions due to having to fall back to scalar
>>>>> comparisons.
>>>>> 
>>>>> I was thinking about two other possible solutions:
>>>>> 
>>>>> 1. Change the gimplifier to allow trapping vector comparisons.  That's
>>>>> a bit complicated, because tree_could_throw_p checks not only for
>>>>> floating point traps, but also e.g. for array index out of bounds
>>>>> traps.  So I would have to create a tree_could_throw_p version which
>>>>> disregards specific kinds of traps.
>>>>> 
>>>>> 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use
>>>>> its tree_code instead of SSA_NAME.  The potential problem I see with
>>>>> this is that there appears to be no guarantee that _5 will be inlined
>>>>> into _6 at a later point.  So if we say that we don't need to fall
>>>>> back to scalar comparisons based on availability of vector >
>>>>> instruction and inlining does not happen, then what's actually will
>>>>> be required is vector selection (vsel on S/390), which might not be
>>>>> available in general case.
>>>>> 
>>>>> What would be a better way to proceed here?
>>>> 
>>>> On GIMPLE there isn't a good reason to split out trapping comparisons
>>>> from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs
>>>> where it is important because we'd have no way to represent EH info
>>>> when not done.  It might be a bit awkward to preserve EH across RTL
>>>> expansion though in case the [VEC_]COND_EXPR are not expanded
>>>> as a single pattern, but I'm not sure.
>>> 
>>> Ok, so I'm testing the following now - for the problematic test that
>>> helped:
>>> 
>>> diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
>>> index b0c9f9b671a..940aa394769 100644
>>> --- a/gcc/gimple-expr.c
>>> +++ b/gcc/gimple-expr.c
>>> @@ -602,17 +602,33 @@ is_gimple_lvalue (tree t)
>>>        || TREE_CODE (t) == BIT_FIELD_REF);
>>> }
>>> 
>>> -/*  Return true if T is a GIMPLE condition.  */
>>> +/* Helper for is_gimple_condexpr and is_possibly_trapping_gimple_condexpr.  */
>>> 
>>> -bool
>>> -is_gimple_condexpr (tree t)
>>> +static bool
>>> +is_gimple_condexpr_1 (tree t, bool allow_traps)
>>> {
>>>  return (is_gimple_val (t) || (COMPARISON_CLASS_P (t)
>>> -                             && !tree_could_throw_p (t)
>>> +                             && (allow_traps || !tree_could_throw_p (t))
>>>                              && is_gimple_val (TREE_OPERAND (t, 0))
>>>                              && is_gimple_val (TREE_OPERAND (t, 1))));
>>> }
>>> 
>>> +/*  Return true if T is a GIMPLE condition.  */
>>> +
>>> +bool
>>> +is_gimple_condexpr (tree t)
>>> +{
>>> +  return is_gimple_condexpr_1 (t, false);
>>> +}
>>> +
>>> +/* Like is_gimple_condexpr, but allow the T to trap.  */
>>> +
>>> +bool
>>> +is_possibly_trapping_gimple_condexpr (tree t)
>>> +{
>>> +  return is_gimple_condexpr_1 (t, true);
>>> +}
>>> +
>>> /* Return true if T is a gimple address.  */
>>> 
>>> bool
>>> diff --git a/gcc/gimple-expr.h b/gcc/gimple-expr.h
>>> index 1ad1432bd17..20546ca5b99 100644
>>> --- a/gcc/gimple-expr.h
>>> +++ b/gcc/gimple-expr.h
>>> @@ -41,6 +41,7 @@ extern void gimple_cond_get_ops_from_tree (tree, enum tree_code *, tree *,
>>>                                         tree *);
>>> extern bool is_gimple_lvalue (tree);
>>> extern bool is_gimple_condexpr (tree);
>>> +extern bool is_possibly_trapping_gimple_condexpr (tree);
>>> extern bool is_gimple_address (const_tree);
>>> extern bool is_gimple_invariant_address (const_tree);
>>> extern bool is_gimple_ip_invariant_address (const_tree);
>>> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
>>> index daa0b71c191..4e6256390c0 100644
>>> --- a/gcc/gimplify.c
>>> +++ b/gcc/gimplify.c
>>> @@ -12973,6 +12973,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
>>>  else if (gimple_test_f == is_gimple_val
>>>           || gimple_test_f == is_gimple_call_addr
>>>           || gimple_test_f == is_gimple_condexpr
>>> +        || gimple_test_f == is_possibly_trapping_gimple_condexpr
>>>           || gimple_test_f == is_gimple_mem_rhs
>>>           || gimple_test_f == is_gimple_mem_rhs_or_call
>>>           || gimple_test_f == is_gimple_reg_rhs
>>> @@ -13814,7 +13815,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
>>>          enum gimplify_status r0, r1, r2;
>>> 
>>>          r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
>>> -                             post_p, is_gimple_condexpr, fb_rvalue);
>>> +                             post_p, is_possibly_trapping_gimple_condexpr, fb_rvalue);
>>>          r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
>>>                              post_p, is_gimple_val, fb_rvalue);
>>>          r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p,
>>> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
>>> index b75fdb2e63f..175b858f56b 100644
>>> --- a/gcc/tree-cfg.c
>>> +++ b/gcc/tree-cfg.c
>>> @@ -4121,8 +4121,11 @@ verify_gimple_assign_ternary (gassign *stmt)
>>>      return true;
>>>    }
>>> 
>>> -  if (((rhs_code == VEC_COND_EXPR || rhs_code == COND_EXPR)
>>> -       ? !is_gimple_condexpr (rhs1) : !is_gimple_val (rhs1))
>>> +  if ((rhs_code == VEC_COND_EXPR
>>> +       ? !is_possibly_trapping_gimple_condexpr (rhs1)
>>> +       : (rhs_code == COND_EXPR
>>> +       ? !is_gimple_condexpr (rhs1)
>>> +       : !is_gimple_val (rhs1)))
>>>      || !is_gimple_val (rhs2)
>>>      || !is_gimple_val (rhs3))
>>>    {
>>> 
>>>> 
>>>> To go this route you'd have to split the is_gimple_condexpr check
>>>> I guess and eventually users turning [VEC_]COND_EXPR into conditional
>>>> code (do we have any?) have to be extra careful then.
>>>> 
>>> 
>>> We have expand_vector_condition, which turns VEC_COND_EXPR into
>>> COND_EXPR - but this should be harmless, right?  I could not find
>>> anything else.
>> 
>> Ugh, I've realized I need to check not only VEC_COND_EXPR, but also
>> COND_EXPR usages.  There is, of course, a great deal more code, so I'm
>> not sure whether I looked exhaustively through it, but there are at
>> least store_expr and do_jump which do exactly this during expansion.
>> Should we worry about EH edges at this point?
> 
> Well, the EH edge needs to persist (and be rooted off the comparison,
> not the selection).

Ok, I'm trying to create some samples that may reveal problems with EH
edges in these two cases.  So far with these experiments I only managed
to find and unrelated S/390 bug :-)
https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00065.html

> That said, I'd simply stop using is_gimple_condexpr for GIMPLE_CONDs
> (it allows is_gimple_val which isn't proper form for GIMPLE_COND).  Of course
> there's code using it for GIMPLE_CONDs which would need to be adjusted.

I'm sorry, I don't quite get this - what would that buy us?  and what
would you use instead?  Right now we fix up non-conforming values
accepted by is_gimple_val using gimple_cond_get_ops_from_tree - is
there a problem with this approach?

What I have in mind right now is to:
- allow trapping conditions for COND_EXPR and VEC_COND_EXPR;
- report them as trapping in operation_could_trap_p and
  tree_could_trap_p iff their condition is trapping;
- find and adjust all places where this messes up EH edges.

GIMPLE_COND logic appears to be already covered precisely because it
uses is_gimple_condexpr.

Am I missing something?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
  2019-09-02 16:28           ` Ilya Leoshkevich
@ 2019-09-03 10:08             ` Richard Biener
  2019-09-03 10:34               ` Ilya Leoshkevich
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Biener @ 2019-09-03 10:08 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: GCC Patches, Richard Sandiford, Segher Boessenkool

On Mon, Sep 2, 2019 at 6:28 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> > Am 02.09.2019 um 12:37 schrieb Richard Biener <richard.guenther@gmail.com>:
> >
> > On Fri, Aug 30, 2019 at 5:25 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >>
> >>> Am 30.08.2019 um 16:40 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
> >>>
> >>>> Am 30.08.2019 um 09:12 schrieb Richard Biener <richard.guenther@gmail.com>:
> >>>>
> >>>> On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >>>>>
> >>>>>> Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
> >>>>>>
> >>>>>> Bootstrap and regtest running on x86_64-redhat-linux and
> >>>>>> s390x-redhat-linux.
> >>>>>>
> >>>>>> This patch series adds signaling FP comparison support (both scalar and
> >>>>>> vector) to s390 backend.
> >>>>>
> >>>>> I'm running into a problem on ppc64 with this patch, and it would be
> >>>>> great if someone could help me figure out the best way to resolve it.
> >>>>>
> >>>>> vector36.C test is failing because gimplifier produces the following
> >>>>>
> >>>>> _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 };
> >>>>> _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
> >>>>>
> >>>>> from
> >>>>>
> >>>>> VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) ,
> >>>>>                { -1, -1, -1, -1 } ,
> >>>>>                { 0, 0, 0, 0 } >
> >>>>>
> >>>>> Since the comparison tree code is now hidden behind a temporary, my code
> >>>>> does not have anything to pass to the backend.  The reason for creating
> >>>>> a temporary is that the comparison can trap, and so the following check
> >>>>> in gimplify_expr fails:
> >>>>>
> >>>>> if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p))
> >>>>>  goto out;
> >>>>>
> >>>>> gimple_test_f is is_gimple_condexpr, and it eventually calls
> >>>>> operation_could_trap_p (GT).
> >>>>>
> >>>>> My current solution is to simply state that backend does not support
> >>>>> SSA_NAME in vector comparisons, however, I don't like it, since it may
> >>>>> cause performance regressions due to having to fall back to scalar
> >>>>> comparisons.
> >>>>>
> >>>>> I was thinking about two other possible solutions:
> >>>>>
> >>>>> 1. Change the gimplifier to allow trapping vector comparisons.  That's
> >>>>> a bit complicated, because tree_could_throw_p checks not only for
> >>>>> floating point traps, but also e.g. for array index out of bounds
> >>>>> traps.  So I would have to create a tree_could_throw_p version which
> >>>>> disregards specific kinds of traps.
> >>>>>
> >>>>> 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use
> >>>>> its tree_code instead of SSA_NAME.  The potential problem I see with
> >>>>> this is that there appears to be no guarantee that _5 will be inlined
> >>>>> into _6 at a later point.  So if we say that we don't need to fall
> >>>>> back to scalar comparisons based on availability of vector >
> >>>>> instruction and inlining does not happen, then what's actually will
> >>>>> be required is vector selection (vsel on S/390), which might not be
> >>>>> available in general case.
> >>>>>
> >>>>> What would be a better way to proceed here?
> >>>>
> >>>> On GIMPLE there isn't a good reason to split out trapping comparisons
> >>>> from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs
> >>>> where it is important because we'd have no way to represent EH info
> >>>> when not done.  It might be a bit awkward to preserve EH across RTL
> >>>> expansion though in case the [VEC_]COND_EXPR are not expanded
> >>>> as a single pattern, but I'm not sure.
> >>>
> >>> Ok, so I'm testing the following now - for the problematic test that
> >>> helped:
> >>>
> >>> diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
> >>> index b0c9f9b671a..940aa394769 100644
> >>> --- a/gcc/gimple-expr.c
> >>> +++ b/gcc/gimple-expr.c
> >>> @@ -602,17 +602,33 @@ is_gimple_lvalue (tree t)
> >>>        || TREE_CODE (t) == BIT_FIELD_REF);
> >>> }
> >>>
> >>> -/*  Return true if T is a GIMPLE condition.  */
> >>> +/* Helper for is_gimple_condexpr and is_possibly_trapping_gimple_condexpr.  */
> >>>
> >>> -bool
> >>> -is_gimple_condexpr (tree t)
> >>> +static bool
> >>> +is_gimple_condexpr_1 (tree t, bool allow_traps)
> >>> {
> >>>  return (is_gimple_val (t) || (COMPARISON_CLASS_P (t)
> >>> -                             && !tree_could_throw_p (t)
> >>> +                             && (allow_traps || !tree_could_throw_p (t))
> >>>                              && is_gimple_val (TREE_OPERAND (t, 0))
> >>>                              && is_gimple_val (TREE_OPERAND (t, 1))));
> >>> }
> >>>
> >>> +/*  Return true if T is a GIMPLE condition.  */
> >>> +
> >>> +bool
> >>> +is_gimple_condexpr (tree t)
> >>> +{
> >>> +  return is_gimple_condexpr_1 (t, false);
> >>> +}
> >>> +
> >>> +/* Like is_gimple_condexpr, but allow the T to trap.  */
> >>> +
> >>> +bool
> >>> +is_possibly_trapping_gimple_condexpr (tree t)
> >>> +{
> >>> +  return is_gimple_condexpr_1 (t, true);
> >>> +}
> >>> +
> >>> /* Return true if T is a gimple address.  */
> >>>
> >>> bool
> >>> diff --git a/gcc/gimple-expr.h b/gcc/gimple-expr.h
> >>> index 1ad1432bd17..20546ca5b99 100644
> >>> --- a/gcc/gimple-expr.h
> >>> +++ b/gcc/gimple-expr.h
> >>> @@ -41,6 +41,7 @@ extern void gimple_cond_get_ops_from_tree (tree, enum tree_code *, tree *,
> >>>                                         tree *);
> >>> extern bool is_gimple_lvalue (tree);
> >>> extern bool is_gimple_condexpr (tree);
> >>> +extern bool is_possibly_trapping_gimple_condexpr (tree);
> >>> extern bool is_gimple_address (const_tree);
> >>> extern bool is_gimple_invariant_address (const_tree);
> >>> extern bool is_gimple_ip_invariant_address (const_tree);
> >>> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> >>> index daa0b71c191..4e6256390c0 100644
> >>> --- a/gcc/gimplify.c
> >>> +++ b/gcc/gimplify.c
> >>> @@ -12973,6 +12973,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
> >>>  else if (gimple_test_f == is_gimple_val
> >>>           || gimple_test_f == is_gimple_call_addr
> >>>           || gimple_test_f == is_gimple_condexpr
> >>> +        || gimple_test_f == is_possibly_trapping_gimple_condexpr
> >>>           || gimple_test_f == is_gimple_mem_rhs
> >>>           || gimple_test_f == is_gimple_mem_rhs_or_call
> >>>           || gimple_test_f == is_gimple_reg_rhs
> >>> @@ -13814,7 +13815,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
> >>>          enum gimplify_status r0, r1, r2;
> >>>
> >>>          r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
> >>> -                             post_p, is_gimple_condexpr, fb_rvalue);
> >>> +                             post_p, is_possibly_trapping_gimple_condexpr, fb_rvalue);
> >>>          r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
> >>>                              post_p, is_gimple_val, fb_rvalue);
> >>>          r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p,
> >>> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> >>> index b75fdb2e63f..175b858f56b 100644
> >>> --- a/gcc/tree-cfg.c
> >>> +++ b/gcc/tree-cfg.c
> >>> @@ -4121,8 +4121,11 @@ verify_gimple_assign_ternary (gassign *stmt)
> >>>      return true;
> >>>    }
> >>>
> >>> -  if (((rhs_code == VEC_COND_EXPR || rhs_code == COND_EXPR)
> >>> -       ? !is_gimple_condexpr (rhs1) : !is_gimple_val (rhs1))
> >>> +  if ((rhs_code == VEC_COND_EXPR
> >>> +       ? !is_possibly_trapping_gimple_condexpr (rhs1)
> >>> +       : (rhs_code == COND_EXPR
> >>> +       ? !is_gimple_condexpr (rhs1)
> >>> +       : !is_gimple_val (rhs1)))
> >>>      || !is_gimple_val (rhs2)
> >>>      || !is_gimple_val (rhs3))
> >>>    {
> >>>
> >>>>
> >>>> To go this route you'd have to split the is_gimple_condexpr check
> >>>> I guess and eventually users turning [VEC_]COND_EXPR into conditional
> >>>> code (do we have any?) have to be extra careful then.
> >>>>
> >>>
> >>> We have expand_vector_condition, which turns VEC_COND_EXPR into
> >>> COND_EXPR - but this should be harmless, right?  I could not find
> >>> anything else.
> >>
> >> Ugh, I've realized I need to check not only VEC_COND_EXPR, but also
> >> COND_EXPR usages.  There is, of course, a great deal more code, so I'm
> >> not sure whether I looked exhaustively through it, but there are at
> >> least store_expr and do_jump which do exactly this during expansion.
> >> Should we worry about EH edges at this point?
> >
> > Well, the EH edge needs to persist (and be rooted off the comparison,
> > not the selection).
>
> Ok, I'm trying to create some samples that may reveal problems with EH
> edges in these two cases.  So far with these experiments I only managed
> to find and unrelated S/390 bug :-)
> https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00065.html
>
> > That said, I'd simply stop using is_gimple_condexpr for GIMPLE_CONDs
> > (it allows is_gimple_val which isn't proper form for GIMPLE_COND).  Of course
> > there's code using it for GIMPLE_CONDs which would need to be adjusted.
>
> I'm sorry, I don't quite get this - what would that buy us?  and what
> would you use instead?  Right now we fix up non-conforming values
> accepted by is_gimple_val using gimple_cond_get_ops_from_tree - is
> there a problem with this approach?
>
> What I have in mind right now is to:
> - allow trapping conditions for COND_EXPR and VEC_COND_EXPR;
> - report them as trapping in operation_could_trap_p and
>   tree_could_trap_p iff their condition is trapping;
> - find and adjust all places where this messes up EH edges.
>
> GIMPLE_COND logic appears to be already covered precisely because it
> uses is_gimple_condexpr.
>
> Am I missing something?

Not really - all I'm saying is that currently we use is_gimple_condexpr
to check whether a GENERIC tree is suitable for [VEC_]COND_EXPR
during for example forward propagation.

And GIMPLE_COND already uses its own logic (as you say) but
still passes use is_gimple_condexpr for it.

So my proposal would be to change is_gimple_condexpr to
allow trapping [VEC_]COND_EXPR and stop using is_gimple_condexpr
checks on conditions to be used for GIMPLE_CONDs (and substitute
another predicate there).  For this to work and catch wrong-doings
we should amend gimple_cond_get_ops_from_tree to assert
that the extracted condition cannot trap.

Richard.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
  2019-09-03 10:08             ` Richard Biener
@ 2019-09-03 10:34               ` Ilya Leoshkevich
  2019-09-03 11:29                 ` Richard Biener
  0 siblings, 1 reply; 35+ messages in thread
From: Ilya Leoshkevich @ 2019-09-03 10:34 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Richard Sandiford, Segher Boessenkool

> Am 03.09.2019 um 12:07 schrieb Richard Biener <richard.guenther@gmail.com>:
> 
> On Mon, Sep 2, 2019 at 6:28 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>> 
>>> Am 02.09.2019 um 12:37 schrieb Richard Biener <richard.guenther@gmail.com>:
>>> 
>>> On Fri, Aug 30, 2019 at 5:25 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>>>> 
>>>>> Am 30.08.2019 um 16:40 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
>>>>> 
>>>>>> Am 30.08.2019 um 09:12 schrieb Richard Biener <richard.guenther@gmail.com>:
>>>>>> 
>>>>>> On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>>>>>>> 
>>>>>>>> Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
>>>>>>>> 
>>>>>>>> Bootstrap and regtest running on x86_64-redhat-linux and
>>>>>>>> s390x-redhat-linux.
>>>>>>>> 
>>>>>>>> This patch series adds signaling FP comparison support (both scalar and
>>>>>>>> vector) to s390 backend.
>>>>>>> 
>>>>>>> I'm running into a problem on ppc64 with this patch, and it would be
>>>>>>> great if someone could help me figure out the best way to resolve it.
>>>>>>> 
>>>>>>> vector36.C test is failing because gimplifier produces the following
>>>>>>> 
>>>>>>> _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 };
>>>>>>> _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
>>>>>>> 
>>>>>>> from
>>>>>>> 
>>>>>>> VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) ,
>>>>>>>               { -1, -1, -1, -1 } ,
>>>>>>>               { 0, 0, 0, 0 } >
>>>>>>> 
>>>>>>> Since the comparison tree code is now hidden behind a temporary, my code
>>>>>>> does not have anything to pass to the backend.  The reason for creating
>>>>>>> a temporary is that the comparison can trap, and so the following check
>>>>>>> in gimplify_expr fails:
>>>>>>> 
>>>>>>> if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p))
>>>>>>> goto out;
>>>>>>> 
>>>>>>> gimple_test_f is is_gimple_condexpr, and it eventually calls
>>>>>>> operation_could_trap_p (GT).
>>>>>>> 
>>>>>>> My current solution is to simply state that backend does not support
>>>>>>> SSA_NAME in vector comparisons, however, I don't like it, since it may
>>>>>>> cause performance regressions due to having to fall back to scalar
>>>>>>> comparisons.
>>>>>>> 
>>>>>>> I was thinking about two other possible solutions:
>>>>>>> 
>>>>>>> 1. Change the gimplifier to allow trapping vector comparisons.  That's
>>>>>>> a bit complicated, because tree_could_throw_p checks not only for
>>>>>>> floating point traps, but also e.g. for array index out of bounds
>>>>>>> traps.  So I would have to create a tree_could_throw_p version which
>>>>>>> disregards specific kinds of traps.
>>>>>>> 
>>>>>>> 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use
>>>>>>> its tree_code instead of SSA_NAME.  The potential problem I see with
>>>>>>> this is that there appears to be no guarantee that _5 will be inlined
>>>>>>> into _6 at a later point.  So if we say that we don't need to fall
>>>>>>> back to scalar comparisons based on availability of vector >
>>>>>>> instruction and inlining does not happen, then what's actually will
>>>>>>> be required is vector selection (vsel on S/390), which might not be
>>>>>>> available in general case.
>>>>>>> 
>>>>>>> What would be a better way to proceed here?
>>>>>> 
>>>>>> On GIMPLE there isn't a good reason to split out trapping comparisons
>>>>>> from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs
>>>>>> where it is important because we'd have no way to represent EH info
>>>>>> when not done.  It might be a bit awkward to preserve EH across RTL
>>>>>> expansion though in case the [VEC_]COND_EXPR are not expanded
>>>>>> as a single pattern, but I'm not sure.
>>>>> 
>>>>> Ok, so I'm testing the following now - for the problematic test that
>>>>> helped:
>>>>> 
>>>>> diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
>>>>> index b0c9f9b671a..940aa394769 100644
>>>>> --- a/gcc/gimple-expr.c
>>>>> +++ b/gcc/gimple-expr.c
>>>>> @@ -602,17 +602,33 @@ is_gimple_lvalue (tree t)
>>>>>       || TREE_CODE (t) == BIT_FIELD_REF);
>>>>> }
>>>>> 
>>>>> -/*  Return true if T is a GIMPLE condition.  */
>>>>> +/* Helper for is_gimple_condexpr and is_possibly_trapping_gimple_condexpr.  */
>>>>> 
>>>>> -bool
>>>>> -is_gimple_condexpr (tree t)
>>>>> +static bool
>>>>> +is_gimple_condexpr_1 (tree t, bool allow_traps)
>>>>> {
>>>>> return (is_gimple_val (t) || (COMPARISON_CLASS_P (t)
>>>>> -                             && !tree_could_throw_p (t)
>>>>> +                             && (allow_traps || !tree_could_throw_p (t))
>>>>>                             && is_gimple_val (TREE_OPERAND (t, 0))
>>>>>                             && is_gimple_val (TREE_OPERAND (t, 1))));
>>>>> }
>>>>> 
>>>>> +/*  Return true if T is a GIMPLE condition.  */
>>>>> +
>>>>> +bool
>>>>> +is_gimple_condexpr (tree t)
>>>>> +{
>>>>> +  return is_gimple_condexpr_1 (t, false);
>>>>> +}
>>>>> +
>>>>> +/* Like is_gimple_condexpr, but allow the T to trap.  */
>>>>> +
>>>>> +bool
>>>>> +is_possibly_trapping_gimple_condexpr (tree t)
>>>>> +{
>>>>> +  return is_gimple_condexpr_1 (t, true);
>>>>> +}
>>>>> +
>>>>> /* Return true if T is a gimple address.  */
>>>>> 
>>>>> bool
>>>>> diff --git a/gcc/gimple-expr.h b/gcc/gimple-expr.h
>>>>> index 1ad1432bd17..20546ca5b99 100644
>>>>> --- a/gcc/gimple-expr.h
>>>>> +++ b/gcc/gimple-expr.h
>>>>> @@ -41,6 +41,7 @@ extern void gimple_cond_get_ops_from_tree (tree, enum tree_code *, tree *,
>>>>>                                        tree *);
>>>>> extern bool is_gimple_lvalue (tree);
>>>>> extern bool is_gimple_condexpr (tree);
>>>>> +extern bool is_possibly_trapping_gimple_condexpr (tree);
>>>>> extern bool is_gimple_address (const_tree);
>>>>> extern bool is_gimple_invariant_address (const_tree);
>>>>> extern bool is_gimple_ip_invariant_address (const_tree);
>>>>> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
>>>>> index daa0b71c191..4e6256390c0 100644
>>>>> --- a/gcc/gimplify.c
>>>>> +++ b/gcc/gimplify.c
>>>>> @@ -12973,6 +12973,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
>>>>> else if (gimple_test_f == is_gimple_val
>>>>>          || gimple_test_f == is_gimple_call_addr
>>>>>          || gimple_test_f == is_gimple_condexpr
>>>>> +        || gimple_test_f == is_possibly_trapping_gimple_condexpr
>>>>>          || gimple_test_f == is_gimple_mem_rhs
>>>>>          || gimple_test_f == is_gimple_mem_rhs_or_call
>>>>>          || gimple_test_f == is_gimple_reg_rhs
>>>>> @@ -13814,7 +13815,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
>>>>>         enum gimplify_status r0, r1, r2;
>>>>> 
>>>>>         r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
>>>>> -                             post_p, is_gimple_condexpr, fb_rvalue);
>>>>> +                             post_p, is_possibly_trapping_gimple_condexpr, fb_rvalue);
>>>>>         r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
>>>>>                             post_p, is_gimple_val, fb_rvalue);
>>>>>         r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p,
>>>>> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
>>>>> index b75fdb2e63f..175b858f56b 100644
>>>>> --- a/gcc/tree-cfg.c
>>>>> +++ b/gcc/tree-cfg.c
>>>>> @@ -4121,8 +4121,11 @@ verify_gimple_assign_ternary (gassign *stmt)
>>>>>     return true;
>>>>>   }
>>>>> 
>>>>> -  if (((rhs_code == VEC_COND_EXPR || rhs_code == COND_EXPR)
>>>>> -       ? !is_gimple_condexpr (rhs1) : !is_gimple_val (rhs1))
>>>>> +  if ((rhs_code == VEC_COND_EXPR
>>>>> +       ? !is_possibly_trapping_gimple_condexpr (rhs1)
>>>>> +       : (rhs_code == COND_EXPR
>>>>> +       ? !is_gimple_condexpr (rhs1)
>>>>> +       : !is_gimple_val (rhs1)))
>>>>>     || !is_gimple_val (rhs2)
>>>>>     || !is_gimple_val (rhs3))
>>>>>   {
>>>>> 
>>>>>> 
>>>>>> To go this route you'd have to split the is_gimple_condexpr check
>>>>>> I guess and eventually users turning [VEC_]COND_EXPR into conditional
>>>>>> code (do we have any?) have to be extra careful then.
>>>>>> 
>>>>> 
>>>>> We have expand_vector_condition, which turns VEC_COND_EXPR into
>>>>> COND_EXPR - but this should be harmless, right?  I could not find
>>>>> anything else.
>>>> 
>>>> Ugh, I've realized I need to check not only VEC_COND_EXPR, but also
>>>> COND_EXPR usages.  There is, of course, a great deal more code, so I'm
>>>> not sure whether I looked exhaustively through it, but there are at
>>>> least store_expr and do_jump which do exactly this during expansion.
>>>> Should we worry about EH edges at this point?
>>> 
>>> Well, the EH edge needs to persist (and be rooted off the comparison,
>>> not the selection).
>> 
>> Ok, I'm trying to create some samples that may reveal problems with EH
>> edges in these two cases.  So far with these experiments I only managed
>> to find and unrelated S/390 bug :-)
>> https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00065.html
>> 
>>> That said, I'd simply stop using is_gimple_condexpr for GIMPLE_CONDs
>>> (it allows is_gimple_val which isn't proper form for GIMPLE_COND).  Of course
>>> there's code using it for GIMPLE_CONDs which would need to be adjusted.
>> 
>> I'm sorry, I don't quite get this - what would that buy us?  and what
>> would you use instead?  Right now we fix up non-conforming values
>> accepted by is_gimple_val using gimple_cond_get_ops_from_tree - is
>> there a problem with this approach?
>> 
>> What I have in mind right now is to:
>> - allow trapping conditions for COND_EXPR and VEC_COND_EXPR;
>> - report them as trapping in operation_could_trap_p and
>>  tree_could_trap_p iff their condition is trapping;
>> - find and adjust all places where this messes up EH edges.
>> 
>> GIMPLE_COND logic appears to be already covered precisely because it
>> uses is_gimple_condexpr.
>> 
>> Am I missing something?
> 
> Not really - all I'm saying is that currently we use is_gimple_condexpr
> to check whether a GENERIC tree is suitable for [VEC_]COND_EXPR
> during for example forward propagation.
> 
> And GIMPLE_COND already uses its own logic (as you say) but
> still passes use is_gimple_condexpr for it.
> 
> So my proposal would be to change is_gimple_condexpr to
> allow trapping [VEC_]COND_EXPR and stop using is_gimple_condexpr
> checks on conditions to be used for GIMPLE_CONDs (and substitute
> another predicate there).  For this to work and catch wrong-doings
> we should amend gimple_cond_get_ops_from_tree to assert
> that the extracted condition cannot trap.

Ah, I think now I understand.  While I wanted to keep is_gimple_condexpr
as is and introduce is_possibly_trapping_gimple_condexpr, you're saying
we rather need to change is_gimple_condexpr and introduce, say,
is_non_trapping_gimple_condexpr.

This makes sense, thanks for the explanation!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
  2019-09-03 10:34               ` Ilya Leoshkevich
@ 2019-09-03 11:29                 ` Richard Biener
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Biener @ 2019-09-03 11:29 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: GCC Patches, Richard Sandiford, Segher Boessenkool

On Tue, Sep 3, 2019 at 12:34 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> > Am 03.09.2019 um 12:07 schrieb Richard Biener <richard.guenther@gmail.com>:
> >
> > On Mon, Sep 2, 2019 at 6:28 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >>
> >>> Am 02.09.2019 um 12:37 schrieb Richard Biener <richard.guenther@gmail.com>:
> >>>
> >>> On Fri, Aug 30, 2019 at 5:25 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >>>>
> >>>>> Am 30.08.2019 um 16:40 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
> >>>>>
> >>>>>> Am 30.08.2019 um 09:12 schrieb Richard Biener <richard.guenther@gmail.com>:
> >>>>>>
> >>>>>> On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >>>>>>>
> >>>>>>>> Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
> >>>>>>>>
> >>>>>>>> Bootstrap and regtest running on x86_64-redhat-linux and
> >>>>>>>> s390x-redhat-linux.
> >>>>>>>>
> >>>>>>>> This patch series adds signaling FP comparison support (both scalar and
> >>>>>>>> vector) to s390 backend.
> >>>>>>>
> >>>>>>> I'm running into a problem on ppc64 with this patch, and it would be
> >>>>>>> great if someone could help me figure out the best way to resolve it.
> >>>>>>>
> >>>>>>> vector36.C test is failing because gimplifier produces the following
> >>>>>>>
> >>>>>>> _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 };
> >>>>>>> _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
> >>>>>>>
> >>>>>>> from
> >>>>>>>
> >>>>>>> VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) ,
> >>>>>>>               { -1, -1, -1, -1 } ,
> >>>>>>>               { 0, 0, 0, 0 } >
> >>>>>>>
> >>>>>>> Since the comparison tree code is now hidden behind a temporary, my code
> >>>>>>> does not have anything to pass to the backend.  The reason for creating
> >>>>>>> a temporary is that the comparison can trap, and so the following check
> >>>>>>> in gimplify_expr fails:
> >>>>>>>
> >>>>>>> if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p))
> >>>>>>> goto out;
> >>>>>>>
> >>>>>>> gimple_test_f is is_gimple_condexpr, and it eventually calls
> >>>>>>> operation_could_trap_p (GT).
> >>>>>>>
> >>>>>>> My current solution is to simply state that backend does not support
> >>>>>>> SSA_NAME in vector comparisons, however, I don't like it, since it may
> >>>>>>> cause performance regressions due to having to fall back to scalar
> >>>>>>> comparisons.
> >>>>>>>
> >>>>>>> I was thinking about two other possible solutions:
> >>>>>>>
> >>>>>>> 1. Change the gimplifier to allow trapping vector comparisons.  That's
> >>>>>>> a bit complicated, because tree_could_throw_p checks not only for
> >>>>>>> floating point traps, but also e.g. for array index out of bounds
> >>>>>>> traps.  So I would have to create a tree_could_throw_p version which
> >>>>>>> disregards specific kinds of traps.
> >>>>>>>
> >>>>>>> 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use
> >>>>>>> its tree_code instead of SSA_NAME.  The potential problem I see with
> >>>>>>> this is that there appears to be no guarantee that _5 will be inlined
> >>>>>>> into _6 at a later point.  So if we say that we don't need to fall
> >>>>>>> back to scalar comparisons based on availability of vector >
> >>>>>>> instruction and inlining does not happen, then what's actually will
> >>>>>>> be required is vector selection (vsel on S/390), which might not be
> >>>>>>> available in general case.
> >>>>>>>
> >>>>>>> What would be a better way to proceed here?
> >>>>>>
> >>>>>> On GIMPLE there isn't a good reason to split out trapping comparisons
> >>>>>> from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs
> >>>>>> where it is important because we'd have no way to represent EH info
> >>>>>> when not done.  It might be a bit awkward to preserve EH across RTL
> >>>>>> expansion though in case the [VEC_]COND_EXPR are not expanded
> >>>>>> as a single pattern, but I'm not sure.
> >>>>>
> >>>>> Ok, so I'm testing the following now - for the problematic test that
> >>>>> helped:
> >>>>>
> >>>>> diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
> >>>>> index b0c9f9b671a..940aa394769 100644
> >>>>> --- a/gcc/gimple-expr.c
> >>>>> +++ b/gcc/gimple-expr.c
> >>>>> @@ -602,17 +602,33 @@ is_gimple_lvalue (tree t)
> >>>>>       || TREE_CODE (t) == BIT_FIELD_REF);
> >>>>> }
> >>>>>
> >>>>> -/*  Return true if T is a GIMPLE condition.  */
> >>>>> +/* Helper for is_gimple_condexpr and is_possibly_trapping_gimple_condexpr.  */
> >>>>>
> >>>>> -bool
> >>>>> -is_gimple_condexpr (tree t)
> >>>>> +static bool
> >>>>> +is_gimple_condexpr_1 (tree t, bool allow_traps)
> >>>>> {
> >>>>> return (is_gimple_val (t) || (COMPARISON_CLASS_P (t)
> >>>>> -                             && !tree_could_throw_p (t)
> >>>>> +                             && (allow_traps || !tree_could_throw_p (t))
> >>>>>                             && is_gimple_val (TREE_OPERAND (t, 0))
> >>>>>                             && is_gimple_val (TREE_OPERAND (t, 1))));
> >>>>> }
> >>>>>
> >>>>> +/*  Return true if T is a GIMPLE condition.  */
> >>>>> +
> >>>>> +bool
> >>>>> +is_gimple_condexpr (tree t)
> >>>>> +{
> >>>>> +  return is_gimple_condexpr_1 (t, false);
> >>>>> +}
> >>>>> +
> >>>>> +/* Like is_gimple_condexpr, but allow the T to trap.  */
> >>>>> +
> >>>>> +bool
> >>>>> +is_possibly_trapping_gimple_condexpr (tree t)
> >>>>> +{
> >>>>> +  return is_gimple_condexpr_1 (t, true);
> >>>>> +}
> >>>>> +
> >>>>> /* Return true if T is a gimple address.  */
> >>>>>
> >>>>> bool
> >>>>> diff --git a/gcc/gimple-expr.h b/gcc/gimple-expr.h
> >>>>> index 1ad1432bd17..20546ca5b99 100644
> >>>>> --- a/gcc/gimple-expr.h
> >>>>> +++ b/gcc/gimple-expr.h
> >>>>> @@ -41,6 +41,7 @@ extern void gimple_cond_get_ops_from_tree (tree, enum tree_code *, tree *,
> >>>>>                                        tree *);
> >>>>> extern bool is_gimple_lvalue (tree);
> >>>>> extern bool is_gimple_condexpr (tree);
> >>>>> +extern bool is_possibly_trapping_gimple_condexpr (tree);
> >>>>> extern bool is_gimple_address (const_tree);
> >>>>> extern bool is_gimple_invariant_address (const_tree);
> >>>>> extern bool is_gimple_ip_invariant_address (const_tree);
> >>>>> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> >>>>> index daa0b71c191..4e6256390c0 100644
> >>>>> --- a/gcc/gimplify.c
> >>>>> +++ b/gcc/gimplify.c
> >>>>> @@ -12973,6 +12973,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
> >>>>> else if (gimple_test_f == is_gimple_val
> >>>>>          || gimple_test_f == is_gimple_call_addr
> >>>>>          || gimple_test_f == is_gimple_condexpr
> >>>>> +        || gimple_test_f == is_possibly_trapping_gimple_condexpr
> >>>>>          || gimple_test_f == is_gimple_mem_rhs
> >>>>>          || gimple_test_f == is_gimple_mem_rhs_or_call
> >>>>>          || gimple_test_f == is_gimple_reg_rhs
> >>>>> @@ -13814,7 +13815,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
> >>>>>         enum gimplify_status r0, r1, r2;
> >>>>>
> >>>>>         r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
> >>>>> -                             post_p, is_gimple_condexpr, fb_rvalue);
> >>>>> +                             post_p, is_possibly_trapping_gimple_condexpr, fb_rvalue);
> >>>>>         r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
> >>>>>                             post_p, is_gimple_val, fb_rvalue);
> >>>>>         r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p,
> >>>>> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> >>>>> index b75fdb2e63f..175b858f56b 100644
> >>>>> --- a/gcc/tree-cfg.c
> >>>>> +++ b/gcc/tree-cfg.c
> >>>>> @@ -4121,8 +4121,11 @@ verify_gimple_assign_ternary (gassign *stmt)
> >>>>>     return true;
> >>>>>   }
> >>>>>
> >>>>> -  if (((rhs_code == VEC_COND_EXPR || rhs_code == COND_EXPR)
> >>>>> -       ? !is_gimple_condexpr (rhs1) : !is_gimple_val (rhs1))
> >>>>> +  if ((rhs_code == VEC_COND_EXPR
> >>>>> +       ? !is_possibly_trapping_gimple_condexpr (rhs1)
> >>>>> +       : (rhs_code == COND_EXPR
> >>>>> +       ? !is_gimple_condexpr (rhs1)
> >>>>> +       : !is_gimple_val (rhs1)))
> >>>>>     || !is_gimple_val (rhs2)
> >>>>>     || !is_gimple_val (rhs3))
> >>>>>   {
> >>>>>
> >>>>>>
> >>>>>> To go this route you'd have to split the is_gimple_condexpr check
> >>>>>> I guess and eventually users turning [VEC_]COND_EXPR into conditional
> >>>>>> code (do we have any?) have to be extra careful then.
> >>>>>>
> >>>>>
> >>>>> We have expand_vector_condition, which turns VEC_COND_EXPR into
> >>>>> COND_EXPR - but this should be harmless, right?  I could not find
> >>>>> anything else.
> >>>>
> >>>> Ugh, I've realized I need to check not only VEC_COND_EXPR, but also
> >>>> COND_EXPR usages.  There is, of course, a great deal more code, so I'm
> >>>> not sure whether I looked exhaustively through it, but there are at
> >>>> least store_expr and do_jump which do exactly this during expansion.
> >>>> Should we worry about EH edges at this point?
> >>>
> >>> Well, the EH edge needs to persist (and be rooted off the comparison,
> >>> not the selection).
> >>
> >> Ok, I'm trying to create some samples that may reveal problems with EH
> >> edges in these two cases.  So far with these experiments I only managed
> >> to find and unrelated S/390 bug :-)
> >> https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00065.html
> >>
> >>> That said, I'd simply stop using is_gimple_condexpr for GIMPLE_CONDs
> >>> (it allows is_gimple_val which isn't proper form for GIMPLE_COND).  Of course
> >>> there's code using it for GIMPLE_CONDs which would need to be adjusted.
> >>
> >> I'm sorry, I don't quite get this - what would that buy us?  and what
> >> would you use instead?  Right now we fix up non-conforming values
> >> accepted by is_gimple_val using gimple_cond_get_ops_from_tree - is
> >> there a problem with this approach?
> >>
> >> What I have in mind right now is to:
> >> - allow trapping conditions for COND_EXPR and VEC_COND_EXPR;
> >> - report them as trapping in operation_could_trap_p and
> >>  tree_could_trap_p iff their condition is trapping;
> >> - find and adjust all places where this messes up EH edges.
> >>
> >> GIMPLE_COND logic appears to be already covered precisely because it
> >> uses is_gimple_condexpr.
> >>
> >> Am I missing something?
> >
> > Not really - all I'm saying is that currently we use is_gimple_condexpr
> > to check whether a GENERIC tree is suitable for [VEC_]COND_EXPR
> > during for example forward propagation.
> >
> > And GIMPLE_COND already uses its own logic (as you say) but
> > still passes use is_gimple_condexpr for it.
> >
> > So my proposal would be to change is_gimple_condexpr to
> > allow trapping [VEC_]COND_EXPR and stop using is_gimple_condexpr
> > checks on conditions to be used for GIMPLE_CONDs (and substitute
> > another predicate there).  For this to work and catch wrong-doings
> > we should amend gimple_cond_get_ops_from_tree to assert
> > that the extracted condition cannot trap.
>
> Ah, I think now I understand.  While I wanted to keep is_gimple_condexpr
> as is and introduce is_possibly_trapping_gimple_condexpr, you're saying
> we rather need to change is_gimple_condexpr and introduce, say,
> is_non_trapping_gimple_condexpr.
>
> This makes sense, thanks for the explanation!

I'd say is_gimple_cond_expr - bah! stupid clashing names ;)
OK, so is_gimple_cond_condexpr vs. is_gimple_condexpr_cond?
Hmm, no.

I don't like to explicitely spell "non_trapping" here but instead
find names that tell one is for GIMPLE_COND while the
other is for GIMPLE_ASSIGN with a [VEC_]COND_EXPR operation.

Ah, maybe is_gimple_cond_condition () vs. is_gimple_assign_condition ()?
Or is_gimple_condexpr_for_cond and is_gimple_condexpr_for_assign?

Richard.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2019-09-03 11:29 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-22 13:47 [PATCH v2 0/9] S/390: Use signaling FP comparison instructions Ilya Leoshkevich
2019-08-22 13:47 ` [PATCH v2 3/9] Introduce can_vector_compare_p function Ilya Leoshkevich
2019-08-23 11:08   ` Richard Sandiford
2019-08-23 11:39     ` Richard Biener
2019-08-23 11:43       ` Ilya Leoshkevich
2019-08-26 10:04         ` Richard Biener
2019-08-26 12:18           ` Ilya Leoshkevich
2019-08-26 13:45             ` Richard Biener
2019-08-26 13:46               ` Ilya Leoshkevich
2019-08-26 17:15                 ` Ilya Leoshkevich
2019-08-27  8:07                   ` Richard Sandiford
2019-08-27 12:27                     ` Richard Biener
2019-08-23 11:40     ` Ilya Leoshkevich
2019-08-23 14:23       ` Richard Sandiford
2019-08-22 13:47 ` [PATCH v2 2/9] hash_traits: split pointer_hash_mark from pointer_hash Ilya Leoshkevich
2019-08-22 13:47 ` [PATCH v2 1/9] Document signaling for min, max and ltgt operations Ilya Leoshkevich
2019-08-22 13:48 ` [PATCH v2 4/9] S/390: Do not use signaling vector comparisons on z13 Ilya Leoshkevich
2019-08-22 13:48 ` [PATCH v2 5/9] S/390: Implement vcond expander for V1TI,V1TF Ilya Leoshkevich
2019-08-22 13:54 ` [PATCH v2 6/9] S/390: Remove code duplication in vec_unordered<mode> Ilya Leoshkevich
2019-08-22 14:13 ` [PATCH v2 7/9] S/390: Remove code duplication in vec_* comparison expanders Ilya Leoshkevich
2019-08-22 14:17 ` [PATCH v2 8/9] S/390: Use signaling FP comparison instructions Ilya Leoshkevich
2019-08-22 14:26 ` [PATCH v2 9/9] S/390: Test " Ilya Leoshkevich
2019-08-29 16:08 ` [PATCH v2 0/9] S/390: Use " Ilya Leoshkevich
2019-08-30  7:27   ` Richard Biener
2019-08-30  7:28     ` Richard Biener
2019-08-30 14:32       ` Ilya Leoshkevich
2019-08-30 10:14     ` Segher Boessenkool
2019-08-30 11:49       ` Richard Biener
2019-08-30 15:05     ` Ilya Leoshkevich
2019-08-30 16:09       ` Ilya Leoshkevich
2019-09-02 10:37         ` Richard Biener
2019-09-02 16:28           ` Ilya Leoshkevich
2019-09-03 10:08             ` Richard Biener
2019-09-03 10:34               ` Ilya Leoshkevich
2019-09-03 11:29                 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).