public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [x86 PATCH] PR target/106933: Limit TImode STV to SSA-like def-use chains.
@ 2022-12-22 23:09 Roger Sayle
  2022-12-23  7:23 ` Uros Bizjak
  2023-01-09  7:24 ` Richard Biener
  0 siblings, 2 replies; 3+ messages in thread
From: Roger Sayle @ 2022-12-22 23:09 UTC (permalink / raw)
  To: 'GCC Patches'; +Cc: 'Uros Bizjak', 'H.J. Lu'

[-- Attachment #1: Type: text/plain, Size: 2919 bytes --]


With many thanks to H.J. for doing all the hard work, this patch resolves
two P1 regressions; PR target/106933 and PR target/106959.

Although superficially similar, the i386 backend's two scalar-to-vector
(STV) passes perform their transformations in importantly different ways.
The original pass converting SImode and DImode operations to V4SImode
or V2DImode operations is "soft", allowing values to be maintained in
both integer and vector hard registers.  The newer pass converting TImode
operations to V1TImode is "hard" (all or nothing) that converts all uses
of a pseudo to vector form.  To implement this it invokes powerful ju-ju
calling SET_MODE on a REG_rtx, which due to RTL sharing, often updates
this pseudo's mode everywhere in the RTL chain.  Hence, TImode STV can only
be performed when all uses of a pseudo are convertible to V1TImode form.
To ensure this the STV passes currently use data-flow analysis to inspect
all DEFs and USEs in a chain.  This works fine for chains that are in
the usual single assignment form, but the occurrence of uninitialized
variables, or multiple assignments that split a pseudo's usage into
several independent chains (lifetimes) can lead to situations where
some but not all of a pseudo's occurrences need to be updated.  This is
safe for the SImode/DImode pass, but leads to the above bugs during
the TImode pass.

My one minor tweak to HJ's patch from comment #4 of bugzilla PR106959
is to only perform the new single_def_chain_p check for TImode STV; it
turns out that STV of SImode/DImode min/max operates safely on multiple-def
chains, and prohibiting this leads to testsuite regressions.  We don't
(yet) support V1TImode min/max, so this idiom isn't an issue during the
TImode STV pass.

For the record, the two alternate possible fixes are (i) make the TImode
STV pass "soft", by eliminating use of SET_MODE, instead using replace_rtx
with a new pseudo, or (ii) merging "chains" so that multiple DFA
chains/lifetimes are considered a single STV chain.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-12-22  H.J. Lu  <hjl.tools@gmail.com>
            Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR target/106933
	PR target/106959
	* config/i386/i386-features.cc (single_def_chain_p): New predicate
	function to check that a pseudo's use-def chain is in SSA form.
	(timode_scalar_to_vector_candidate_p): Check that TImode regs that
	are SET_DEST or SET_SRC of an insn match/are single_def_chain_p.

gcc/testsuite/ChangeLog
	PR target/106933
	PR target/106959
	* gcc.target/i386/pr106933-1.c: New test case.
	* gcc.target/i386/pr106933-2.c: Likewise.
	* gcc.target/i386/pr106959-1.c: Likewise.
	* gcc.target/i386/pr106959-2.c: Likewise.
	* gcc.target/i386/pr106959-3.c: Likewise.

Thanks in advance,
Roger
--


[-- Attachment #2: patchhj2.txt --]
[-- Type: text/plain, Size: 3960 bytes --]

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index fd212262..4bf8bb3 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -1756,6 +1756,19 @@ pseudo_reg_set (rtx_insn *insn)
   return set;
 }
 
+/* Return true if the register REG is defined in a single DEF chain.
+   If it is defined in more than one DEF chains, we may not be able
+   to convert it in all chains.  */
+
+static bool
+single_def_chain_p (rtx reg)
+{
+  df_ref ref = DF_REG_DEF_CHAIN (REGNO (reg));
+  if (!ref)
+    return false;
+  return DF_REF_NEXT_REG (ref) == nullptr;
+}
+
 /* Check if comparison INSN may be transformed into vector comparison.
    Currently we transform equality/inequality checks which look like:
    (set (reg:CCZ 17 flags) (compare:CCZ (reg:TI x) (reg:TI y)))  */
@@ -1972,9 +1985,14 @@ timode_scalar_to_vector_candidate_p (rtx_insn *insn)
       && !TARGET_SSE_UNALIGNED_STORE_OPTIMAL)
     return false;
 
+  if (REG_P (dst) && !single_def_chain_p (dst))
+    return false;
+
   switch (GET_CODE (src))
     {
     case REG:
+      return single_def_chain_p (src);
+
     case CONST_WIDE_INT:
       return true;
 
diff --git a/gcc/testsuite/gcc.target/i386/pr106933-1.c b/gcc/testsuite/gcc.target/i386/pr106933-1.c
new file mode 100644
index 0000000..bcd9576
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106933-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2" } */
+
+short int
+bar (void);
+
+__int128
+empty (void)
+{
+}
+
+__attribute__ ((simd)) int
+foo (__int128 *p)
+{
+  int a = 0x80000000;
+
+  *p = empty ();
+
+  return *p == (a < bar ());
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/pr106933-2.c b/gcc/testsuite/gcc.target/i386/pr106933-2.c
new file mode 100644
index 0000000..ac7d07e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106933-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-msse4 -Os" } */
+
+__int128 n;
+
+__int128
+empty (void)
+{
+}
+
+int
+foo (void)
+{
+  n = empty ();
+
+  return n == 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr106959-1.c b/gcc/testsuite/gcc.target/i386/pr106959-1.c
new file mode 100644
index 0000000..4bac2a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106959-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-msse4 -O2 -fno-tree-loop-im --param max-combine-insns=2 -Wno-shift-count-overflow" } */
+
+unsigned __int128 n;
+
+int
+foo (int x)
+{
+  __int128 a = 0;
+  int b = !!(n * 2);
+
+  while (x < 2)
+    {
+      if (a)
+        {
+          if (n)
+            n ^= 1;
+          else
+            x <<= 32;
+        }
+
+      a = 1;
+    }
+
+  return b;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr106959-2.c b/gcc/testsuite/gcc.target/i386/pr106959-2.c
new file mode 100644
index 0000000..29f0c47
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106959-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-msse4 -O2 -fno-tree-loop-im -Wno-shift-count-overflow" } */
+
+unsigned __int128 n;
+
+int
+foo (int x)
+{
+  __int128 a = 0;
+  int b = !!(n * 2);
+
+  while (x < 2)
+    {
+      if (a)
+        {
+          if (n)
+            n ^= 1;
+          else
+            x <<= 32;
+        }
+
+      a = 1;
+    }
+
+  return b;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr106959-3.c b/gcc/testsuite/gcc.target/i386/pr106959-3.c
new file mode 100644
index 0000000..0f58f13
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106959-3.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -fpeel-loops" } */
+
+unsigned __int128 m;
+int n;
+
+__attribute__ ((simd)) void
+foo (int x)
+{
+  x = n ? n : (short int) x;
+  if (x)
+    m /= 2;
+}
+

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [x86 PATCH] PR target/106933: Limit TImode STV to SSA-like def-use chains.
  2022-12-22 23:09 [x86 PATCH] PR target/106933: Limit TImode STV to SSA-like def-use chains Roger Sayle
@ 2022-12-23  7:23 ` Uros Bizjak
  2023-01-09  7:24 ` Richard Biener
  1 sibling, 0 replies; 3+ messages in thread
From: Uros Bizjak @ 2022-12-23  7:23 UTC (permalink / raw)
  To: Roger Sayle; +Cc: GCC Patches, H.J. Lu

On Fri, Dec 23, 2022 at 12:09 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> With many thanks to H.J. for doing all the hard work, this patch resolves
> two P1 regressions; PR target/106933 and PR target/106959.
>
> Although superficially similar, the i386 backend's two scalar-to-vector
> (STV) passes perform their transformations in importantly different ways.
> The original pass converting SImode and DImode operations to V4SImode
> or V2DImode operations is "soft", allowing values to be maintained in
> both integer and vector hard registers.  The newer pass converting TImode
> operations to V1TImode is "hard" (all or nothing) that converts all uses
> of a pseudo to vector form.  To implement this it invokes powerful ju-ju
> calling SET_MODE on a REG_rtx, which due to RTL sharing, often updates
> this pseudo's mode everywhere in the RTL chain.  Hence, TImode STV can only
> be performed when all uses of a pseudo are convertible to V1TImode form.
> To ensure this the STV passes currently use data-flow analysis to inspect
> all DEFs and USEs in a chain.  This works fine for chains that are in
> the usual single assignment form, but the occurrence of uninitialized
> variables, or multiple assignments that split a pseudo's usage into
> several independent chains (lifetimes) can lead to situations where
> some but not all of a pseudo's occurrences need to be updated.  This is
> safe for the SImode/DImode pass, but leads to the above bugs during
> the TImode pass.
>
> My one minor tweak to HJ's patch from comment #4 of bugzilla PR106959
> is to only perform the new single_def_chain_p check for TImode STV; it
> turns out that STV of SImode/DImode min/max operates safely on multiple-def
> chains, and prohibiting this leads to testsuite regressions.  We don't
> (yet) support V1TImode min/max, so this idiom isn't an issue during the
> TImode STV pass.
>
> For the record, the two alternate possible fixes are (i) make the TImode
> STV pass "soft", by eliminating use of SET_MODE, instead using replace_rtx
> with a new pseudo, or (ii) merging "chains" so that multiple DFA
> chains/lifetimes are considered a single STV chain.

I assume these two alternatives would result in much more invasive
surgery, so let's consider these "for the future".

> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-12-22  H.J. Lu  <hjl.tools@gmail.com>
>             Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/106933
>         PR target/106959
>         * config/i386/i386-features.cc (single_def_chain_p): New predicate
>         function to check that a pseudo's use-def chain is in SSA form.
>         (timode_scalar_to_vector_candidate_p): Check that TImode regs that
>         are SET_DEST or SET_SRC of an insn match/are single_def_chain_p.
>
> gcc/testsuite/ChangeLog
>         PR target/106933
>         PR target/106959
>         * gcc.target/i386/pr106933-1.c: New test case.
>         * gcc.target/i386/pr106933-2.c: Likewise.
>         * gcc.target/i386/pr106959-1.c: Likewise.
>         * gcc.target/i386/pr106959-2.c: Likewise.
>         * gcc.target/i386/pr106959-3.c: Likewise.

OK.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [x86 PATCH] PR target/106933: Limit TImode STV to SSA-like def-use chains.
  2022-12-22 23:09 [x86 PATCH] PR target/106933: Limit TImode STV to SSA-like def-use chains Roger Sayle
  2022-12-23  7:23 ` Uros Bizjak
@ 2023-01-09  7:24 ` Richard Biener
  1 sibling, 0 replies; 3+ messages in thread
From: Richard Biener @ 2023-01-09  7:24 UTC (permalink / raw)
  To: Roger Sayle; +Cc: GCC Patches, Uros Bizjak, H.J. Lu

On Fri, Dec 23, 2022 at 12:10 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> With many thanks to H.J. for doing all the hard work, this patch resolves
> two P1 regressions; PR target/106933 and PR target/106959.
>
> Although superficially similar, the i386 backend's two scalar-to-vector
> (STV) passes perform their transformations in importantly different ways.
> The original pass converting SImode and DImode operations to V4SImode
> or V2DImode operations is "soft", allowing values to be maintained in
> both integer and vector hard registers.  The newer pass converting TImode
> operations to V1TImode is "hard" (all or nothing) that converts all uses
> of a pseudo to vector form.  To implement this it invokes powerful ju-ju
> calling SET_MODE on a REG_rtx, which due to RTL sharing, often updates
> this pseudo's mode everywhere in the RTL chain.  Hence, TImode STV can only
> be performed when all uses of a pseudo are convertible to V1TImode form.
> To ensure this the STV passes currently use data-flow analysis to inspect
> all DEFs and USEs in a chain.  This works fine for chains that are in
> the usual single assignment form, but the occurrence of uninitialized
> variables, or multiple assignments that split a pseudo's usage into
> several independent chains (lifetimes) can lead to situations where
> some but not all of a pseudo's occurrences need to be updated.  This is
> safe for the SImode/DImode pass, but leads to the above bugs during
> the TImode pass.
>
> My one minor tweak to HJ's patch from comment #4 of bugzilla PR106959
> is to only perform the new single_def_chain_p check for TImode STV; it
> turns out that STV of SImode/DImode min/max operates safely on multiple-def
> chains, and prohibiting this leads to testsuite regressions.  We don't
> (yet) support V1TImode min/max, so this idiom isn't an issue during the
> TImode STV pass.
>
> For the record, the two alternate possible fixes are (i) make the TImode
> STV pass "soft", by eliminating use of SET_MODE, instead using replace_rtx
> with a new pseudo, or (ii) merging "chains" so that multiple DFA
> chains/lifetimes are considered a single STV chain.

Similar to (ii) one could also as (iii) split non-overlapping def-use
chains to use
different pseudos (run web before stv?)

In general making the pass "soft" sounds like the right thing to do in the end,
if there's cost reasons we want to go all-or-nothing then costing should already
ensure that.

I agree with Uros that the patch as proposed is best at this stage.

Note the single_def_chain_p helper looks like it should be in df*.{cc,h}.
There's the related(?) df_find_single_def_src, so a better name
could be `df_ref df_single_def (rtx)'?  I'll note the function does

      df_ref adef = DF_REG_DEF_CHAIN (REGNO (reg));
      if (adef == NULL || DF_REF_NEXT_REG (adef) != NULL
          || DF_REF_IS_ARTIFICIAL (adef)
          || (DF_REF_FLAGS (adef)
              & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
        return NULL_RTX;

in particular it rejects DF_REF_PARTIAL/CONDITIONAL and artificial
defs as "single defs" which your function doesn't.

Richard.

>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-12-22  H.J. Lu  <hjl.tools@gmail.com>
>             Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/106933
>         PR target/106959
>         * config/i386/i386-features.cc (single_def_chain_p): New predicate
>         function to check that a pseudo's use-def chain is in SSA form.
>         (timode_scalar_to_vector_candidate_p): Check that TImode regs that
>         are SET_DEST or SET_SRC of an insn match/are single_def_chain_p.
>
> gcc/testsuite/ChangeLog
>         PR target/106933
>         PR target/106959
>         * gcc.target/i386/pr106933-1.c: New test case.
>         * gcc.target/i386/pr106933-2.c: Likewise.
>         * gcc.target/i386/pr106959-1.c: Likewise.
>         * gcc.target/i386/pr106959-2.c: Likewise.
>         * gcc.target/i386/pr106959-3.c: Likewise.
>
> Thanks in advance,
> Roger
> --
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-01-09  7:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-22 23:09 [x86 PATCH] PR target/106933: Limit TImode STV to SSA-like def-use chains Roger Sayle
2022-12-23  7:23 ` Uros Bizjak
2023-01-09  7:24 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).