* [PR80693] drop value of parallel SETs dropped by combine
@ 2017-05-18 10:34 Alexandre Oliva
2017-06-07 2:14 ` Alexandre Oliva
2017-06-08 19:50 ` Segher Boessenkool
0 siblings, 2 replies; 13+ messages in thread
From: Alexandre Oliva @ 2017-05-18 10:34 UTC (permalink / raw)
To: gcc-patches
When an insn used by combine has multiple SETs, only the non-REG_UNUSED
set is used: others will end up dropped on the floor. We have to take
note of the dropped REG_UNUSED SETs, clearing their cached values, so
that, even if the REGs remain used (e.g. because they were referenced
in the used SET_SRC), we will not use properties of the latest value
as if they applied to the earlier one.
Regstrapped on x86_64-linux-gnu. Ok to install?
for gcc/ChangeLog
PR rtl-optimization/80693
* combine.c (distribute_notes): Add IDEST parameter. Reset any
REG_UNUSED REGs that are not IDEST, if IDEST is given. Adjust
all callers.
for gcc/testsuite/ChangeLog
PR rtl-optimization/80693
* gcc.dg/pr80693.c: New.
---
gcc/combine.c | 80 ++++++++++++++++++++++++++--------------
gcc/testsuite/gcc.dg/pr80693.c | 26 +++++++++++++
2 files changed, 78 insertions(+), 28 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/pr80693.c
diff --git a/gcc/combine.c b/gcc/combine.c
index 39ef3c6..6954f92 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -483,7 +483,7 @@ static void reg_dead_at_p_1 (rtx, const_rtx, void *);
static int reg_dead_at_p (rtx, rtx_insn *);
static void move_deaths (rtx, rtx, int, rtx_insn *, rtx *);
static int reg_bitfield_target_p (rtx, rtx);
-static void distribute_notes (rtx, rtx_insn *, rtx_insn *, rtx_insn *, rtx, rtx, rtx);
+static void distribute_notes (rtx, rtx_insn *, rtx, rtx_insn *, rtx_insn *, rtx, rtx, rtx);
static void distribute_links (struct insn_link *);
static void mark_used_regs_combine (rtx);
static void record_promoted_value (rtx_insn *, rtx);
@@ -4170,7 +4170,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
remove_note (undobuf.other_insn, note);
}
- distribute_notes (new_other_notes, undobuf.other_insn,
+ distribute_notes (new_other_notes, undobuf.other_insn, NULL_RTX,
undobuf.other_insn, NULL, NULL_RTX, NULL_RTX,
NULL_RTX);
}
@@ -4424,19 +4424,19 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
/* Distribute all the LOG_LINKS and REG_NOTES from I1, I2, and I3. */
if (i3notes)
- distribute_notes (i3notes, i3, i3, newi2pat ? i2 : NULL,
+ distribute_notes (i3notes, i3, NULL_RTX, i3, newi2pat ? i2 : NULL,
elim_i2, elim_i1, elim_i0);
if (i2notes)
- distribute_notes (i2notes, i2, i3, newi2pat ? i2 : NULL,
+ distribute_notes (i2notes, i2, i2dest, i3, newi2pat ? i2 : NULL,
elim_i2, elim_i1, elim_i0);
if (i1notes)
- distribute_notes (i1notes, i1, i3, newi2pat ? i2 : NULL,
+ distribute_notes (i1notes, i1, i1dest, i3, newi2pat ? i2 : NULL,
elim_i2, local_elim_i1, local_elim_i0);
if (i0notes)
- distribute_notes (i0notes, i0, i3, newi2pat ? i2 : NULL,
+ distribute_notes (i0notes, i0, i0dest, i3, newi2pat ? i2 : NULL,
elim_i2, elim_i1, local_elim_i0);
if (midnotes)
- distribute_notes (midnotes, NULL, i3, newi2pat ? i2 : NULL,
+ distribute_notes (midnotes, NULL, NULL_RTX, i3, newi2pat ? i2 : NULL,
elim_i2, elim_i1, elim_i0);
/* Distribute any notes added to I2 or I3 by recog_for_combine. We
@@ -4444,12 +4444,12 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
so we always pass it as i3. */
if (newi2pat && new_i2_notes)
- distribute_notes (new_i2_notes, i2, i2, NULL, NULL_RTX, NULL_RTX,
- NULL_RTX);
+ distribute_notes (new_i2_notes, i2, NULL_RTX, i2, NULL,
+ NULL_RTX, NULL_RTX, NULL_RTX);
if (new_i3_notes)
- distribute_notes (new_i3_notes, i3, i3, NULL, NULL_RTX, NULL_RTX,
- NULL_RTX);
+ distribute_notes (new_i3_notes, i3, NULL_RTX, i3, NULL,
+ NULL_RTX, NULL_RTX, NULL_RTX);
/* If I3DEST was used in I3SRC, it really died in I3. We may need to
put a REG_DEAD note for it somewhere. If NEWI2PAT exists and sets
@@ -4462,10 +4462,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
{
rtx new_note = alloc_reg_note (REG_DEAD, i3dest_killed, NULL_RTX);
if (newi2pat && reg_set_p (i3dest_killed, newi2pat))
- distribute_notes (new_note, NULL, i2, NULL, elim_i2,
+ distribute_notes (new_note, NULL, NULL_RTX, i2, NULL, elim_i2,
elim_i1, elim_i0);
else
- distribute_notes (new_note, NULL, i3, newi2pat ? i2 : NULL,
+ distribute_notes (new_note, NULL, NULL_RTX, i3, newi2pat ? i2 : NULL,
elim_i2, elim_i1, elim_i0);
}
@@ -4473,10 +4473,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
{
rtx new_note = alloc_reg_note (REG_DEAD, i2dest, NULL_RTX);
if (newi2pat && reg_set_p (i2dest, newi2pat))
- distribute_notes (new_note, NULL, i2, NULL, NULL_RTX,
+ distribute_notes (new_note, NULL, NULL_RTX, i2, NULL, NULL_RTX,
NULL_RTX, NULL_RTX);
else
- distribute_notes (new_note, NULL, i3, newi2pat ? i2 : NULL,
+ distribute_notes (new_note, NULL, NULL_RTX, i3, newi2pat ? i2 : NULL,
NULL_RTX, NULL_RTX, NULL_RTX);
}
@@ -4484,10 +4484,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
{
rtx new_note = alloc_reg_note (REG_DEAD, i1dest, NULL_RTX);
if (newi2pat && reg_set_p (i1dest, newi2pat))
- distribute_notes (new_note, NULL, i2, NULL, NULL_RTX,
+ distribute_notes (new_note, NULL, NULL_RTX, i2, NULL, NULL_RTX,
NULL_RTX, NULL_RTX);
else
- distribute_notes (new_note, NULL, i3, newi2pat ? i2 : NULL,
+ distribute_notes (new_note, NULL, NULL_RTX, i3, newi2pat ? i2 : NULL,
NULL_RTX, NULL_RTX, NULL_RTX);
}
@@ -4495,10 +4495,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
{
rtx new_note = alloc_reg_note (REG_DEAD, i0dest, NULL_RTX);
if (newi2pat && reg_set_p (i0dest, newi2pat))
- distribute_notes (new_note, NULL, i2, NULL, NULL_RTX,
+ distribute_notes (new_note, NULL, NULL_RTX, i2, NULL, NULL_RTX,
NULL_RTX, NULL_RTX);
else
- distribute_notes (new_note, NULL, i3, newi2pat ? i2 : NULL,
+ distribute_notes (new_note, NULL, NULL_RTX, i3, newi2pat ? i2 : NULL,
NULL_RTX, NULL_RTX, NULL_RTX);
}
@@ -13938,9 +13938,11 @@ reg_bitfield_target_p (rtx x, rtx body)
return 0;
}
\f
-/* Given a chain of REG_NOTES originally from FROM_INSN, try to place them
- as appropriate. I3 and I2 are the insns resulting from the combination
- insns including FROM (I2 may be zero).
+/* Given a chain of REG_NOTES originally from FROM_INSN, try to place
+ them as appropriate. IDEST is the dest in FROM_INSN used for
+ substitution (other dests in it are just dropped on the floor). I3
+ and I2 are the insns resulting from the combination insns including
+ FROM (I2 may be zero).
ELIM_I2 and ELIM_I1 are either zero or registers that we know will
not need REG_DEAD notes because they are being substituted for. This
@@ -13950,7 +13952,8 @@ reg_bitfield_target_p (rtx x, rtx body)
on the type of note. */
static void
-distribute_notes (rtx notes, rtx_insn *from_insn, rtx_insn *i3, rtx_insn *i2,
+distribute_notes (rtx notes, rtx_insn *from_insn, rtx idest,
+ rtx_insn *i3, rtx_insn *i2,
rtx elim_i2, rtx elim_i1, rtx elim_i0)
{
rtx note, next_note;
@@ -14087,6 +14090,26 @@ distribute_notes (rtx notes, rtx_insn *from_insn, rtx_insn *i3, rtx_insn *i2,
PUT_REG_NOTE_KIND (note, REG_DEAD);
place = i3;
}
+
+ /* If there were any parallel sets in FROM_INSN other than
+ the one setting IDEST, it must be REG_UNUSED, otherwise
+ we could not have used FROM_INSN in combine. Since this
+ combine attempt succeeded, we know this unused SET was
+ dropped on the floor, because the insn was either deleted
+ or created from a new pattern that does not use its
+ SET_DEST. We must forget whatever we knew about the
+ value that was stored by that SET, since the prior value
+ may still be present in IDEST's src expression or
+ elsewhere, and we do not want to use properties of the
+ dropped value as if they applied to the prior one when
+ simplifying e.g. subsequent combine attempts. */
+ if (idest && XEXP (note, 0) != idest)
+ {
+ gcc_assert (REG_P (XEXP (note, 0)));
+ record_value_for_reg (XEXP (note, 0), NULL, NULL_RTX);
+ INC_REG_N_SETS (REGNO (XEXP (note, 0)), -1);
+ }
+
break;
case REG_EQUAL:
@@ -14295,7 +14318,8 @@ distribute_notes (rtx notes, rtx_insn *from_insn, rtx_insn *i3, rtx_insn *i2,
PATTERN (tem_insn) = pc_rtx;
REG_NOTES (tem_insn) = NULL;
- distribute_notes (old_notes, tem_insn, tem_insn, NULL,
+ distribute_notes (old_notes, tem_insn, NULL_RTX,
+ tem_insn, NULL,
NULL_RTX, NULL_RTX, NULL_RTX);
distribute_links (LOG_LINKS (tem_insn));
@@ -14316,7 +14340,7 @@ distribute_notes (rtx notes, rtx_insn *from_insn, rtx_insn *i3, rtx_insn *i2,
REG_NOTES (cc0_setter) = NULL;
distribute_notes (old_notes, cc0_setter,
- cc0_setter, NULL,
+ NULL_RTX, cc0_setter, NULL,
NULL_RTX, NULL_RTX, NULL_RTX);
distribute_links (LOG_LINKS (cc0_setter));
@@ -14437,9 +14461,9 @@ distribute_notes (rtx notes, rtx_insn *from_insn, rtx_insn *i3, rtx_insn *i2,
rtx new_note = alloc_reg_note (REG_DEAD, piece,
NULL_RTX);
- distribute_notes (new_note, place, place,
- NULL, NULL_RTX, NULL_RTX,
- NULL_RTX);
+ distribute_notes (new_note, place, NULL_RTX,
+ place, NULL, NULL_RTX,
+ NULL_RTX, NULL_RTX);
}
else if (! refers_to_regno_p (i, PATTERN (place))
&& ! find_regno_fusage (place, USE, i))
diff --git a/gcc/testsuite/gcc.dg/pr80693.c b/gcc/testsuite/gcc.dg/pr80693.c
new file mode 100644
index 0000000..aecddd0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr80693.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-O -fno-tree-coalesce-vars" } */
+typedef unsigned char u8;
+typedef unsigned short u16;
+typedef unsigned u32;
+typedef unsigned long u64;
+
+static u64 __attribute__((noinline, noclone))
+foo(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u16 u16_1)
+{
+ u16_1 += 0x1051;
+ u16_1 &= 1;
+ u8_0 <<= u32_0 & 7;
+ u16_0 -= !u16_1;
+ u16_1 >>= ((u16)-u8_0 != 0xff);
+ return u8_0 + u16_0 + u64_0 + u16_1;
+}
+
+int
+main (void)
+{
+ u64 x = foo(1, 1, 0xffff, 0, 1);
+ if (x != 0x80)
+ __builtin_abort();
+ return 0;
+}
--
Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/ FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PR80693] drop value of parallel SETs dropped by combine
2017-05-18 10:34 [PR80693] drop value of parallel SETs dropped by combine Alexandre Oliva
@ 2017-06-07 2:14 ` Alexandre Oliva
2017-06-08 19:50 ` Segher Boessenkool
1 sibling, 0 replies; 13+ messages in thread
From: Alexandre Oliva @ 2017-06-07 2:14 UTC (permalink / raw)
To: gcc-patches
On May 18, 2017, Alexandre Oliva <aoliva@redhat.com> wrote:
> When an insn used by combine has multiple SETs, only the non-REG_UNUSED
> set is used: others will end up dropped on the floor. We have to take
> note of the dropped REG_UNUSED SETs, clearing their cached values, so
> that, even if the REGs remain used (e.g. because they were referenced
> in the used SET_SRC), we will not use properties of the latest value
> as if they applied to the earlier one.
> Regstrapped on x86_64-linux-gnu. Ok to install?
> for gcc/ChangeLog
> PR rtl-optimization/80693
> * combine.c (distribute_notes): Add IDEST parameter. Reset any
> REG_UNUSED REGs that are not IDEST, if IDEST is given. Adjust
> all callers.
> for gcc/testsuite/ChangeLog
> PR rtl-optimization/80693
> * gcc.dg/pr80693.c: New.
Ping?
https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01444.html
--
Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/ FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PR80693] drop value of parallel SETs dropped by combine
2017-05-18 10:34 [PR80693] drop value of parallel SETs dropped by combine Alexandre Oliva
2017-06-07 2:14 ` Alexandre Oliva
@ 2017-06-08 19:50 ` Segher Boessenkool
2017-06-22 6:21 ` Alexandre Oliva
2017-06-22 12:26 ` Alexandre Oliva
1 sibling, 2 replies; 13+ messages in thread
From: Segher Boessenkool @ 2017-06-08 19:50 UTC (permalink / raw)
To: Alexandre Oliva; +Cc: gcc-patches
Hi!
[ I missed this patch the first time around; please cc: me to prevent this ]
On Thu, May 18, 2017 at 07:25:57AM -0300, Alexandre Oliva wrote:
> When an insn used by combine has multiple SETs, only the non-REG_UNUSED
> set is used: others will end up dropped on the floor.
Sometimes, yes; not always.
> We have to take
> note of the dropped REG_UNUSED SETs, clearing their cached values, so
> that, even if the REGs remain used (e.g. because they were referenced
> in the used SET_SRC), we will not use properties of the latest value
> as if they applied to the earlier one.
The reg_stat stuff is no end of pain, sigh.
> PR rtl-optimization/80693
> * combine.c (distribute_notes): Add IDEST parameter. Reset any
> REG_UNUSED REGs that are not IDEST, if IDEST is given. Adjust
> all callers.
Most callers use NULL_RTX for idest. It isn't obvious to me that this
is correct.
> @@ -14087,6 +14090,26 @@ distribute_notes (rtx notes, rtx_insn *from_insn, rtx_insn *i3, rtx_insn *i2,
> PUT_REG_NOTE_KIND (note, REG_DEAD);
> place = i3;
> }
> +
> + /* If there were any parallel sets in FROM_INSN other than
> + the one setting IDEST, it must be REG_UNUSED, otherwise
> + we could not have used FROM_INSN in combine. Since this
> + combine attempt succeeded, we know this unused SET was
> + dropped on the floor, because the insn was either deleted
> + or created from a new pattern that does not use its
> + SET_DEST. We must forget whatever we knew about the
> + value that was stored by that SET, since the prior value
> + may still be present in IDEST's src expression or
> + elsewhere, and we do not want to use properties of the
> + dropped value as if they applied to the prior one when
> + simplifying e.g. subsequent combine attempts. */
> + if (idest && XEXP (note, 0) != idest)
Would it work to just have "else" instead if this "if"? Or hrm, we'll
need to kill the recorded reg_stat value in the last case before this
as well?
> + {
> + gcc_assert (REG_P (XEXP (note, 0)));
> + record_value_for_reg (XEXP (note, 0), NULL, NULL_RTX);
> + INC_REG_N_SETS (REGNO (XEXP (note, 0)), -1);
> + }
> +
> break;
Could you try that out? Or I can do it, let me know what you prefer.
Thanks,
Segher
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PR80693] drop value of parallel SETs dropped by combine
2017-06-08 19:50 ` Segher Boessenkool
@ 2017-06-22 6:21 ` Alexandre Oliva
2017-06-22 16:04 ` Segher Boessenkool
2017-06-22 12:26 ` Alexandre Oliva
1 sibling, 1 reply; 13+ messages in thread
From: Alexandre Oliva @ 2017-06-22 6:21 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: gcc-patches
On Jun 8, 2017, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> [ I missed this patch the first time around; please cc: me to prevent this ]
> On Thu, May 18, 2017 at 07:25:57AM -0300, Alexandre Oliva wrote:
>> When an insn used by combine has multiple SETs, only the non-REG_UNUSED
>> set is used: others will end up dropped on the floor.
> Sometimes, yes; not always.
You mean sets to non-REGs, I suppose. I didn't take them into account
in my statement indeed, but I think it still applies: can_combine_p will
reject parallel SETs if two or more of them don't have a REG_UNUSED note
for their respective SET_DESTs.
>> PR rtl-optimization/80693
>> * combine.c (distribute_notes): Add IDEST parameter. Reset any
>> REG_UNUSED REGs that are not IDEST, if IDEST is given. Adjust
>> all callers.
> Most callers use NULL_RTX for idest. It isn't obvious to me that this
> is correct.
My reasoning is that the block of try_combine that passes i[0-3]dest
will cover all combine-affected insns that could possibly have
REG_UNUSED notes, since these notes would have to be put back in the
insns at that point. Since it's enough to reset the reg stats once,
that would do it, and so we need not be concerned with any other cases,
so we can pass NULL_RTX for idest elsewhere.
>> @@ -14087,6 +14090,26 @@ distribute_notes (rtx notes, rtx_insn *from_insn, rtx_insn *i3, rtx_insn *i2,
>> PUT_REG_NOTE_KIND (note, REG_DEAD);
>> place = i3;
>> }
>> +
>> + /* If there were any parallel sets in FROM_INSN other than
>> + the one setting IDEST, it must be REG_UNUSED, otherwise
>> + we could not have used FROM_INSN in combine. Since this
>> + combine attempt succeeded, we know this unused SET was
>> + dropped on the floor, because the insn was either deleted
>> + or created from a new pattern that does not use its
>> + SET_DEST. We must forget whatever we knew about the
>> + value that was stored by that SET, since the prior value
>> + may still be present in IDEST's src expression or
>> + elsewhere, and we do not want to use properties of the
>> + dropped value as if they applied to the prior one when
>> + simplifying e.g. subsequent combine attempts. */
>> + if (idest && XEXP (note, 0) != idest)
> Would it work to just have "else" instead if this "if"? Or hrm, we'll
> need to kill the recorded reg_stat value in the last case before this
> as well?
You mean instead of passing an idest to distribute_notes and testing it,
right?
We might also need to catch the case in which the first if block
breaks. I think I had missed that.
> Could you try that out? Or I can do it, let me know what you prefer.
I'll give it a shot.
--
Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/ FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PR80693] drop value of parallel SETs dropped by combine
2017-06-08 19:50 ` Segher Boessenkool
2017-06-22 6:21 ` Alexandre Oliva
@ 2017-06-22 12:26 ` Alexandre Oliva
2017-06-22 13:14 ` Alexandre Oliva
2017-06-22 16:44 ` Segher Boessenkool
1 sibling, 2 replies; 13+ messages in thread
From: Alexandre Oliva @ 2017-06-22 12:26 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: gcc-patches
On Jun 8, 2017, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> Would it work to just have "else" instead if this "if"? Or hrm, we'll
> need to kill the recorded reg_stat value in the last case before this
> as well?
The patch below (is this what you meant?) fixes the PR testcase, and the
new else block doesn't get exercised in an x86_64-linux-gnu bootstrap.
However, it seems to me that it might very well drop parallel SETs
without decrementing the sets count, though probably in cases in which
this won't matter.
How's this? (I haven't run regression tests yet)
[PR80693] drop value of parallel SETs dropped by combine
When an insn used by combine has multiple SETs, only the non-REG_UNUSED
set is used: others will end up dropped on the floor. We have to take
note of the dropped REG_UNUSED SETs, clearing their cached values, so
that, even if the REGs remain used (e.g. because they were referenced
in the used SET_SRC), we will not use properties of the latest value
as if they applied to the earlier one.
for gcc/ChangeLog
PR rtl-optimization/80693
* combine.c (distribute_notes): Reset any REG_UNUSED REGs that
are not set or referenced in i3.
for gcc/testsuite/ChangeLog
PR rtl-optimization/80693
* gcc.dg/pr80693.c: New.
---
gcc/combine.c | 19 +++++++++++++++++++
gcc/testsuite/gcc.dg/pr80693.c | 26 ++++++++++++++++++++++++++
2 files changed, 45 insertions(+)
create mode 100644 gcc/testsuite/gcc.dg/pr80693.c
diff --git a/gcc/combine.c b/gcc/combine.c
index 2d49bc2..477010d 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -14087,6 +14087,25 @@ distribute_notes (rtx notes, rtx_insn *from_insn, rtx_insn *i3, rtx_insn *i2,
PUT_REG_NOTE_KIND (note, REG_DEAD);
place = i3;
}
+
+ /* If there were any parallel sets in FROM_INSN other than
+ the one setting IDEST, it must be REG_UNUSED, otherwise
+ we could not have used FROM_INSN in combine. Since this
+ combine attempt succeeded, we know this unused SET was
+ dropped on the floor, because the insn was either deleted
+ or created from a new pattern that does not use its
+ SET_DEST. We must forget whatever we knew about the
+ value that was stored by that SET, since the prior value
+ may still be present in IDEST's src expression or
+ elsewhere, and we do not want to use properties of the
+ dropped value as if they applied to the prior one when
+ simplifying e.g. subsequent combine attempts. */
+ else
+ {
+ gcc_assert (REG_P (XEXP (note, 0)));
+ record_value_for_reg (XEXP (note, 0), NULL, NULL_RTX);
+ INC_REG_N_SETS (REGNO (XEXP (note, 0)), -1);
+ }
break;
case REG_EQUAL:
diff --git a/gcc/testsuite/gcc.dg/pr80693.c b/gcc/testsuite/gcc.dg/pr80693.c
new file mode 100644
index 0000000..aecddd0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr80693.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-O -fno-tree-coalesce-vars" } */
+typedef unsigned char u8;
+typedef unsigned short u16;
+typedef unsigned u32;
+typedef unsigned long u64;
+
+static u64 __attribute__((noinline, noclone))
+foo(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u16 u16_1)
+{
+ u16_1 += 0x1051;
+ u16_1 &= 1;
+ u8_0 <<= u32_0 & 7;
+ u16_0 -= !u16_1;
+ u16_1 >>= ((u16)-u8_0 != 0xff);
+ return u8_0 + u16_0 + u64_0 + u16_1;
+}
+
+int
+main (void)
+{
+ u64 x = foo(1, 1, 0xffff, 0, 1);
+ if (x != 0x80)
+ __builtin_abort();
+ return 0;
+}
--
Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/ FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PR80693] drop value of parallel SETs dropped by combine
2017-06-22 12:26 ` Alexandre Oliva
@ 2017-06-22 13:14 ` Alexandre Oliva
2017-06-22 16:44 ` Segher Boessenkool
1 sibling, 0 replies; 13+ messages in thread
From: Alexandre Oliva @ 2017-06-22 13:14 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: gcc-patches
On Jun 22, 2017, Alexandre Oliva <aoliva@redhat.com> wrote:
> The patch below (is this what you meant?) fixes the PR testcase, and the
> new else block doesn't get exercised in an x86_64-linux-gnu bootstrap.
Err, I misdescribed the situation. It's not that it doesn't get
exercised, it's that this new block doesn't make any noticeable
difference in the generated code. I checked that by adding a new option
in common.opt, running the new block conditionally on the new option,
building stage1 host normally, then continuing bootstrap with
GCC_COMPARE_DEBUG=-fthe-new-option set in the environment, building
target libs and stage2 build and host like that. GCC_COMPARE_DEBUG,
like -fcompare-debug, checks that the compiler dumps at final are the
same.
--
Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/ FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PR80693] drop value of parallel SETs dropped by combine
2017-06-22 6:21 ` Alexandre Oliva
@ 2017-06-22 16:04 ` Segher Boessenkool
2017-06-24 2:01 ` Alexandre Oliva
0 siblings, 1 reply; 13+ messages in thread
From: Segher Boessenkool @ 2017-06-22 16:04 UTC (permalink / raw)
To: Alexandre Oliva; +Cc: gcc-patches
On Thu, Jun 22, 2017 at 03:21:01AM -0300, Alexandre Oliva wrote:
> On Jun 8, 2017, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>
> > [ I missed this patch the first time around; please cc: me to prevent this ]
>
> > On Thu, May 18, 2017 at 07:25:57AM -0300, Alexandre Oliva wrote:
> >> When an insn used by combine has multiple SETs, only the non-REG_UNUSED
> >> set is used: others will end up dropped on the floor.
>
> > Sometimes, yes; not always.
>
> You mean sets to non-REGs, I suppose. I didn't take them into account
> in my statement indeed, but I think it still applies: can_combine_p will
> reject parallel SETs if two or more of them don't have a REG_UNUSED note
> for their respective SET_DESTs.
can_combine_p is not called for I3; it also isn't called until after
I2 is split, if that happens.
It is fairly normal for an I3 with two SETs to have both survive in the
combined insn, even if one is unused. (And this is a good thing! The
insn with one SET removed might not exist. OTOH, if we can remove one
of the SETs we probably should; we don't try that currently, if the
"full" instruction is recognised.)
Segher
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PR80693] drop value of parallel SETs dropped by combine
2017-06-22 12:26 ` Alexandre Oliva
2017-06-22 13:14 ` Alexandre Oliva
@ 2017-06-22 16:44 ` Segher Boessenkool
1 sibling, 0 replies; 13+ messages in thread
From: Segher Boessenkool @ 2017-06-22 16:44 UTC (permalink / raw)
To: Alexandre Oliva; +Cc: gcc-patches
On Thu, Jun 22, 2017 at 09:25:21AM -0300, Alexandre Oliva wrote:
> On Jun 8, 2017, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>
> > Would it work to just have "else" instead if this "if"? Or hrm, we'll
> > need to kill the recorded reg_stat value in the last case before this
> > as well?
>
> The patch below (is this what you meant?)
Yes exactly.
> fixes the PR testcase, and the
> new else block doesn't get exercised in an x86_64-linux-gnu bootstrap.
> However, it seems to me that it might very well drop parallel SETs
> without decrementing the sets count, though probably in cases in which
> this won't matter.
>
> How's this? (I haven't run regression tests yet)
Looks a lot better digestable than the previous patch, thanks!
Things should probably be restructured a bit so we keep the sets count
correct, if that is possible?
> When an insn used by combine has multiple SETs, only the non-REG_UNUSED
> set is used: others will end up dropped on the floor.
This isn't exactly true, as I described in a previous email... You can
write something simpler like
If combine drops a REG_UNUSED SET, [...]
> We have to take
> note of the dropped REG_UNUSED SETs, clearing their cached values, so
> that, even if the REGs remain used (e.g. because they were referenced
> in the used SET_SRC), we will not use properties of the latest value
> as if they applied to the earlier one.
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -14087,6 +14087,25 @@ distribute_notes (rtx notes, rtx_insn *from_insn, rtx_insn *i3, rtx_insn *i2,
> PUT_REG_NOTE_KIND (note, REG_DEAD);
> place = i3;
> }
> +
> + /* If there were any parallel sets in FROM_INSN other than
> + the one setting IDEST, it must be REG_UNUSED, otherwise
> + we could not have used FROM_INSN in combine. Since this
> + combine attempt succeeded, we know this unused SET was
> + dropped on the floor, because the insn was either deleted
> + or created from a new pattern that does not use its
> + SET_DEST. We must forget whatever we knew about the
> + value that was stored by that SET, since the prior value
> + may still be present in IDEST's src expression or
> + elsewhere, and we do not want to use properties of the
> + dropped value as if they applied to the prior one when
> + simplifying e.g. subsequent combine attempts. */
Similar for this comment. (It has a stray tab btw, before "We").
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr80693.c
> @@ -0,0 +1,26 @@
> +/* { dg-do run } */
> +/* { dg-options "-O -fno-tree-coalesce-vars" } */
> +typedef unsigned char u8;
> +typedef unsigned short u16;
> +typedef unsigned u32;
> +typedef unsigned long u64;
Maybe use "long long"? Less incorrect/misleading on 32-bit targets ;-)
Thanks,
Segher
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PR80693] drop value of parallel SETs dropped by combine
2017-06-22 16:04 ` Segher Boessenkool
@ 2017-06-24 2:01 ` Alexandre Oliva
2017-07-07 12:06 ` Segher Boessenkool
0 siblings, 1 reply; 13+ messages in thread
From: Alexandre Oliva @ 2017-06-24 2:01 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: gcc-patches
On Jun 22, 2017, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> On Thu, Jun 22, 2017 at 03:21:01AM -0300, Alexandre Oliva wrote:
>> On Jun 8, 2017, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>
>> > [ I missed this patch the first time around; please cc: me to prevent this ]
>>
>> > On Thu, May 18, 2017 at 07:25:57AM -0300, Alexandre Oliva wrote:
>> >> When an insn used by combine has multiple SETs, only the non-REG_UNUSED
>> >> set is used: others will end up dropped on the floor.
>>
>> > Sometimes, yes; not always.
>>
>> You mean sets to non-REGs, I suppose. I didn't take them into account
>> in my statement indeed, but I think it still applies: can_combine_p will
>> reject parallel SETs if two or more of them don't have a REG_UNUSED note
>> for their respective SET_DESTs.
> can_combine_p is not called for I3; it also isn't called until after
> I2 is split, if that happens.
Oh, I see what you mean now. I just don't think of I3 as "used"; other
insns are "used" in that they are substituted, in whole or in part, into
I3.
On Jun 22, 2017, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> On Thu, Jun 22, 2017 at 09:25:21AM -0300, Alexandre Oliva wrote:
>> On Jun 8, 2017, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>
>> > Would it work to just have "else" instead if this "if"? Or hrm, we'll
>> > need to kill the recorded reg_stat value in the last case before this
>> > as well?
>>
>> The patch below (is this what you meant?)
> Yes exactly.
>> How's this? (I haven't run regression tests yet)
> Looks a lot better digestable than the previous patch, thanks!
> Things should probably be restructured a bit so we keep the sets count
> correct, if that is possible?
I'll have to think a bit to figure out the exact conditions in which to
decrement the sets count, and reset the recorded value. I was thinking
the conditions were the same; am I missing something?
Or are you getting at cases in which we should do both and don't, or
vice-versa? E.g., if reg_referenced_p holds but the subsequent test
doesn't? I guess we do, but don't we have to distinguish the cases of
an original unused set remaining from that of reusing the pseudo for a
new set?
Do we have to test whether from_insn still reg_sets_p the REG_UNUSED
operand, when from_insn is not i3? (e.g., it could be something that
remains set in i1 as a side effect, but that's not used in either i2 or
i3)
Am I overdoing this? The situations I had to analyze in the patch I
posted before were much simpler, and even then I now think I missed a
number of them :-)
>> When an insn used by combine has multiple SETs, only the non-REG_UNUSED
>> set is used: others will end up dropped on the floor.
> write something simpler like
> If combine drops a REG_UNUSED SET, [...]
Nice, thanks.
> Similar for this comment. (It has a stray tab btw, before "We").
*nod*
> Maybe use "long long"? Less incorrect/misleading on 32-bit targets ;-)
Sure, thanks.
--
Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/ FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PR80693] drop value of parallel SETs dropped by combine
2017-06-24 2:01 ` Alexandre Oliva
@ 2017-07-07 12:06 ` Segher Boessenkool
2017-12-07 21:01 ` Alexandre Oliva
0 siblings, 1 reply; 13+ messages in thread
From: Segher Boessenkool @ 2017-07-07 12:06 UTC (permalink / raw)
To: Alexandre Oliva; +Cc: gcc-patches
Hi again, sorry for the delay,
On Fri, Jun 23, 2017 at 11:01:12PM -0300, Alexandre Oliva wrote:
> > Things should probably be restructured a bit so we keep the sets count
> > correct, if that is possible?
>
> I'll have to think a bit to figure out the exact conditions in which to
> decrement the sets count, and reset the recorded value. I was thinking
> the conditions were the same; am I missing something?
>
> Or are you getting at cases in which we should do both and don't, or
> vice-versa? E.g., if reg_referenced_p holds but the subsequent test
> doesn't? I guess we do, but don't we have to distinguish the cases of
> an original unused set remaining from that of reusing the pseudo for a
> new set?
>
> Do we have to test whether from_insn still reg_sets_p the REG_UNUSED
> operand, when from_insn is not i3? (e.g., it could be something that
> remains set in i1 as a side effect, but that's not used in either i2 or
> i3)
>
> Am I overdoing this? The situations I had to analyze in the patch I
> posted before were much simpler, and even then I now think I missed a
> number of them :-)
Yeah you're overdoing it ;-) I meant, just double check if your new
code does the correct thing for the set count. It wasn't obvious to
me (this code is horribly complicated). Whether all existing code is
correct... it's probably best not to look too closely :-/
If you have a patch you feel confident in, could you post it again
please?
Segher
p.s. What I still want to do is never reuse a set pseudo, always
create a new pseudo instead. This will get rid of many existing bugs,
and a lot of the complications in existing code. Unfortunately not
everything is set up yet for creating new pseudos during combine.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PR80693] drop value of parallel SETs dropped by combine
2017-07-07 12:06 ` Segher Boessenkool
@ 2017-12-07 21:01 ` Alexandre Oliva
2017-12-08 10:56 ` Kyrill Tkachov
2017-12-08 13:55 ` Segher Boessenkool
0 siblings, 2 replies; 13+ messages in thread
From: Alexandre Oliva @ 2017-12-07 21:01 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: gcc-patches
On Jul 7, 2017, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> I meant, just double check if your new
> code does the correct thing for the set count.
Sorry this took me so long to get back to. Even this was difficult for
me to answer for sure, then and now.
We don't (and can't?) know whether the REG_UNUSED note originally
pertained to a clobber or a set, but I think that doesn't matter: unless
we've reused the REG in i2 as a scratch, I think we should decrement the
set count, because we used to have a SET or a CLOBBER that is now gone.
Looking back at the issue, I realized we should keep/add the REG_UNUSED
note to i2, if the reg is mentioned in i2, possibly turned into REG_DEAD
if i2 is referenced but not set.
Still, I'm concerned I haven't caught all of the cases in which
adjusting REG_N_SETS would be needed: we might have dropped multiple
SETs of the same pseudo, each with its own REG_UNUSED note (say, from
all of i3, i2, i1, and i0), and the current logic will only decrement
REG_N_SETS once, and only if i2 no longer sets the register.
I'm also concerned that the logic for when the REG is set or referenced
in i3 is incomplete: references in i3 might have come from any of the
previous insns, even if intervening sets of the same register were
substituted and thus removed. Consider the following nightmarish scenario:
i0: (parallel [(set (reg CC) (op1 (reg A)))
(set (reg A) (plus (reg A) (const_int 1)))])
(REG_UNUSED (reg A))
i1: (set (reg A) (ne (reg CC) (const_int 0)))
(REG_DEAD (reg CC))
i2: (parallel [(set (reg CC) (op2 (reg A)))
(set (reg A) (plus (reg A) (const_int 1)))])
(REG_UNUSED (reg A)))
i3: (set (reg A) (eq (reg CC) (const_int 0)))
(REG_DEAD (reg CC))
we might turn that into say:
i2: (set (reg CC) (op3 (reg A)))
(REG_DEAD (reg A))
i3: (set (reg A) (op4 (reg CC)))
(REG_DEAD (reg CC))
and now we'd have to somehow figure out that we're to discount the two
unused sets of reg A, those from i0 and i2, and to turn either
REG_UNUSED note about reg A into a REG_DEAD note to be placed at i2. A
is set at i3, so combine should record its new value, but if it's
computed in terms of the scratch CC and a much-older A, will we get the
correct value? Or is the value unchanged because it's the output of the
latest insn?
Now, consider this slightly simpler scenario (trying to combine only the
first 3 insns):
i0: nil
i1: (parallel [(set (reg CC) (op1 (reg A)))
(set (reg A) (plus (reg A) (const_int 1)))])
(REG_UNUSED (reg A))
i2: (set (reg A) (ne (reg CC) (const_int 0)))
(REG_DEAD (reg CC))
i3: (parallel [(set (reg CC) (op2 (reg A)))
(set (reg A) (plus (reg A) (const_int 1)))])
(REG_UNUSED (reg A)))
this might combine into:
i2: (set (reg A) (op5 (reg A)))
i3: (set (reg CC) (op6 (reg A)))
(REG_DEAD (reg A))
and now we have removed 3 sets to A, but added 1 by splitting within
combine using A as scratch. Would we then have to figure out that for
each of the REG_UNUSED notes pertaining to A we have to drop the
REG_N_SETS count by 1, although A remains used in i3, and set and used
in i2? I don't see how.
I see that combine would record the value for reg A at i2 in this case,
but would it express it in terms of which earlier value of reg A?
Shouldn't we have reset it while placing notes in this case too?
> It wasn't obvious to me (this code is horribly complicated). Whether
> all existing code is correct... it's probably best not to look too
> closely :-/
You're right about its being horribly complicated.
Maybe I should go about it incrementally.
> If you have a patch you feel confident in, could you post it again
> please?
So let me tell you how I feel about this. It has waited long enough,
and there are at least 3 bugs known to be fixed by the first very simple
patch below. The catch is that it doesn't adjust REG_N_SETS at all (we
didn't before the patch, and that didn't seem to hurt too much). I've
regstrapped this successfully on x86_64-linux-gnu and i686-linux-gnu.
--->cut<---
When combine drops a REG_UNUSED SET in a parallel, we have to clear
cached values, so that, even if the REGs remain used (e.g. because
they were referenced in the used SET_SRC), we will not use properties
of the dropped modified value as if they applied to the preserved
original one.
We fail to adjust REG_N_SETS.
for gcc/ChangeLog
PR rtl-optimization/80693
PR rtl-optimization/81019
PR rtl-optimization/81020
* combine.c (distribute_notes): Reset any REG_UNUSED REGs that
are not mentioned in i3. Place the REG_UNUSED note on i2,
possibly modified to REG_DEAD, if it did not originate in i3.
for gcc/testsuite/ChangeLog
PR rtl-optimization/80693
PR rtl-optimization/81019
PR rtl-optimization/81020
* gcc.dg/pr80693.c: New.
* gcc.dg/pr81019.c: New.
---
gcc/combine.c | 40 ++++++++++++++++++++++++++++++++++++++++
gcc/testsuite/gcc.dg/pr80693.c | 26 ++++++++++++++++++++++++++
gcc/testsuite/gcc.dg/pr81019.c | 27 +++++++++++++++++++++++++++
3 files changed, 93 insertions(+)
create mode 100644 gcc/testsuite/gcc.dg/pr80693.c
create mode 100644 gcc/testsuite/gcc.dg/pr81019.c
diff --git a/gcc/combine.c b/gcc/combine.c
index 863abd7a282b..976b5f2e519b 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -14254,6 +14254,46 @@ distribute_notes (rtx notes, rtx_insn *from_insn, rtx_insn *i3, rtx_insn *i2,
PUT_REG_NOTE_KIND (note, REG_DEAD);
place = i3;
}
+
+ /* A SET or CLOBBER of the REG_UNUSED reg has been removed,
+ but we can't tell which at this point. We must reset any
+ expectations we had about the value that was previously
+ stored in the reg. ??? Ideally, we'd adjust REG_N_SETS
+ and, if appropriate, restore its previous value, but we
+ don't have enough information for that at this point. */
+ else
+ {
+ record_value_for_reg (XEXP (note, 0), NULL, NULL_RTX);
+
+ /* Otherwise, if this register is now referenced in i2
+ then the register used to be modified in one of the
+ original insns. If it was i3 (say, in an unused
+ parallel), it's now completely gone, so the note can
+ be discarded. But if it was modified in i2, i1 or i0
+ and we still reference it in i2, then we're
+ referencing the previous value, and since the
+ register was modified and REG_UNUSED, we know that
+ the previous value is now dead. So, if we only
+ reference the register in i2, we change the note to
+ REG_DEAD, to reflect the previous value. However, if
+ we're also setting or clobbering the register as
+ scratch, we know (because the register was not
+ referenced in i3) that it's unused, just as it was
+ unused before, and we place the note in i2. */
+ if (from_insn != i3 && i2 && INSN_P (i2)
+ && reg_referenced_p (XEXP (note, 0), PATTERN (i2)))
+ {
+ if (!reg_set_p (XEXP (note, 0), PATTERN (i2)))
+ PUT_REG_NOTE_KIND (note, REG_DEAD);
+ if (! (REG_P (XEXP (note, 0))
+ ? find_regno_note (i2, REG_NOTE_KIND (note),
+ REGNO (XEXP (note, 0)))
+ : find_reg_note (i2, REG_NOTE_KIND (note),
+ XEXP (note, 0))))
+ place = i2;
+ }
+ }
+
break;
case REG_EQUAL:
diff --git a/gcc/testsuite/gcc.dg/pr80693.c b/gcc/testsuite/gcc.dg/pr80693.c
new file mode 100644
index 000000000000..507177167e58
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr80693.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-O -fno-tree-coalesce-vars" } */
+typedef unsigned char u8;
+typedef unsigned short u16;
+typedef unsigned u32;
+typedef unsigned long long u64;
+
+static u64 __attribute__((noinline, noclone))
+foo(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u16 u16_1)
+{
+ u16_1 += 0x1051;
+ u16_1 &= 1;
+ u8_0 <<= u32_0 & 7;
+ u16_0 -= !u16_1;
+ u16_1 >>= ((u16)-u8_0 != 0xff);
+ return u8_0 + u16_0 + u64_0 + u16_1;
+}
+
+int
+main (void)
+{
+ u64 x = foo(1, 1, 0xffff, 0, 1);
+ if (x != 0x80)
+ __builtin_abort();
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/pr81019.c b/gcc/testsuite/gcc.dg/pr81019.c
new file mode 100644
index 000000000000..cf13bfa92758
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr81019.c
@@ -0,0 +1,27 @@
+/* PR rtl-optimization/81019 */
+/* { dg-do run } */
+/* { dg-options "-O -fno-tree-ccp" } */
+
+unsigned long long __attribute__((noinline, noclone))
+foo (unsigned char a, unsigned short b, unsigned c, unsigned long long d,
+ unsigned char e, unsigned short f, unsigned g, unsigned long long h)
+{
+ g = e;
+ c &= 0 < d;
+ b *= d;
+ g ^= -1;
+ g &= 1;
+ c |= 1;
+ a -= 0 < g;
+ g >>= 1;
+ f = b | (f >> b);
+ return a + c + d + f + g + h;
+}
+
+int
+main (void)
+{
+ if (foo (0, 0, 0, 0, 0, 0, 0, 0) != 0x100)
+ __builtin_abort ();
+ return 0;
+}
--->cut<---
But then it occurred to me that the REG_UNUSED register from i1 might
have *also* been set in i2, and we might then have used it as a scratch
register for split, while the set that the REG_UNUSED pertained to might
have been completely discarded from a parallel in i1 or i0. And then I
wasn't unsure whether or not to decrement REG_N_SETS.
diff --git a/gcc/combine.c b/gcc/combine.c
index 863abd7a282b..277492af9108 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -14254,6 +14254,59 @@ distribute_notes (rtx notes, rtx_insn *from_insn, rtx_insn *i3, rtx_insn *i2,
PUT_REG_NOTE_KIND (note, REG_DEAD);
place = i3;
}
+
+ /* A SET or CLOBBER of the REG_UNUSED reg has been removed,
+ so we we must reset any expectations we had about the
+ value that was previously stored in the reg, and adjust
+ REG_N_SETS. Ideally, we'd restore the register to its
+ previous value, but we don't have enough information for
+ that at this point. */
+ else
+ {
+ record_value_for_reg (XEXP (note, 0), NULL, NULL_RTX);
+
+ bool set_p = false;
+
+ /* Otherwise, if this register is now referenced in i2
+ then the register used to be modified in one of the
+ original insns. If it was i3 (say, in an unused
+ parallel), it's now completely gone, so the note can
+ be discarded. But if it was modified in i2, i1 or i0
+ and we still reference it in i2, then we're
+ referencing the previous value, and since the
+ register was modified and REG_UNUSED, we know that
+ the previous value is now dead. So, if we only
+ reference the register in i2, we change the note to
+ REG_DEAD, to reflect the previous value. However, if
+ we're also setting or clobbering the register as
+ scratch, we know (because the register was not
+ referenced in i3) that it's unused, just as it was
+ unused before, and we place the note in i2. */
+ if (from_insn != i3 && i2 && INSN_P (i2))
+ {
+ set_p = reg_set_p (XEXP (note, 0), PATTERN (i2));
+
+ if (set_p
+ || reg_referenced_p (XEXP (note, 0), PATTERN (i2)))
+ {
+ if (!set_p)
+ PUT_REG_NOTE_KIND (note, REG_DEAD);
+
+ if (! (REG_P (XEXP (note, 0))
+ ? find_regno_note (i2, REG_NOTE_KIND (note),
+ REGNO (XEXP (note, 0)))
+ : find_reg_note (i2, REG_NOTE_KIND (note),
+ XEXP (note, 0))))
+ place = i2;
+ }
+ }
+
+ /* If we no longer have a set, drop the set count. */
+ if (!set_p && REG_P (XEXP (note, 0))
+ && REGNO (XEXP (note, 0)) < reg_n_sets_max)
+ INC_REG_N_SETS (REGNO (XEXP (note, 0)), -1);
+ }
+
break;
case REG_EQUAL:
Then I realized we might have to also reset the cached value when it's
referenced in i3, and possibly also adjust the counts. And also some of
that when it's set in i3. Then I lost it. Help? :-)
FWIW, I'd be glad just installing the patch between --->cut<---s, or
even a simpler version thereof that just resets the recorded value and
doesn't even attempt to place notes in i2, to get the bugs fixed while
we sort out the really tricky issues of adjusting REG_N_SETS. The
(incomplete, but functional) fix has been known for a while, and we
shouldn't subject users to wrong code when we can help it, even if we
might miss optimization opportunities for that, right? Thoughts?
--
Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/ FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PR80693] drop value of parallel SETs dropped by combine
2017-12-07 21:01 ` Alexandre Oliva
@ 2017-12-08 10:56 ` Kyrill Tkachov
2017-12-08 13:55 ` Segher Boessenkool
1 sibling, 0 replies; 13+ messages in thread
From: Kyrill Tkachov @ 2017-12-08 10:56 UTC (permalink / raw)
To: Alexandre Oliva, Segher Boessenkool; +Cc: gcc-patches
Hi Alexandre,
On 07/12/17 21:01, Alexandre Oliva wrote:
> On Jul 7, 2017, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>
> > I meant, just double check if your new
> > code does the correct thing for the set count.
>
> Sorry this took me so long to get back to. Even this was difficult for
> me to answer for sure, then and now.
>
> We don't (and can't?) know whether the REG_UNUSED note originally
> pertained to a clobber or a set, but I think that doesn't matter: unless
> we've reused the REG in i2 as a scratch, I think we should decrement the
> set count, because we used to have a SET or a CLOBBER that is now gone.
>
> Looking back at the issue, I realized we should keep/add the REG_UNUSED
> note to i2, if the reg is mentioned in i2, possibly turned into REG_DEAD
> if i2 is referenced but not set.
>
> Still, I'm concerned I haven't caught all of the cases in which
> adjusting REG_N_SETS would be needed: we might have dropped multiple
> SETs of the same pseudo, each with its own REG_UNUSED note (say, from
> all of i3, i2, i1, and i0), and the current logic will only decrement
> REG_N_SETS once, and only if i2 no longer sets the register.
>
> I'm also concerned that the logic for when the REG is set or referenced
> in i3 is incomplete: references in i3 might have come from any of the
> previous insns, even if intervening sets of the same register were
> substituted and thus removed. Consider the following nightmarish
> scenario:
>
> i0: (parallel [(set (reg CC) (op1 (reg A)))
> (set (reg A) (plus (reg A) (const_int 1)))])
> (REG_UNUSED (reg A))
> i1: (set (reg A) (ne (reg CC) (const_int 0)))
> (REG_DEAD (reg CC))
> i2: (parallel [(set (reg CC) (op2 (reg A)))
> (set (reg A) (plus (reg A) (const_int 1)))])
> (REG_UNUSED (reg A)))
> i3: (set (reg A) (eq (reg CC) (const_int 0)))
> (REG_DEAD (reg CC))
>
> we might turn that into say:
>
> i2: (set (reg CC) (op3 (reg A)))
> (REG_DEAD (reg A))
> i3: (set (reg A) (op4 (reg CC)))
> (REG_DEAD (reg CC))
>
> and now we'd have to somehow figure out that we're to discount the two
> unused sets of reg A, those from i0 and i2, and to turn either
> REG_UNUSED note about reg A into a REG_DEAD note to be placed at i2. A
> is set at i3, so combine should record its new value, but if it's
> computed in terms of the scratch CC and a much-older A, will we get the
> correct value? Or is the value unchanged because it's the output of the
> latest insn?
>
> Now, consider this slightly simpler scenario (trying to combine only the
> first 3 insns):
>
> i0: nil
> i1: (parallel [(set (reg CC) (op1 (reg A)))
> (set (reg A) (plus (reg A) (const_int 1)))])
> (REG_UNUSED (reg A))
> i2: (set (reg A) (ne (reg CC) (const_int 0)))
> (REG_DEAD (reg CC))
> i3: (parallel [(set (reg CC) (op2 (reg A)))
> (set (reg A) (plus (reg A) (const_int 1)))])
> (REG_UNUSED (reg A)))
>
> this might combine into:
>
> i2: (set (reg A) (op5 (reg A)))
> i3: (set (reg CC) (op6 (reg A)))
> (REG_DEAD (reg A))
>
> and now we have removed 3 sets to A, but added 1 by splitting within
> combine using A as scratch. Would we then have to figure out that for
> each of the REG_UNUSED notes pertaining to A we have to drop the
> REG_N_SETS count by 1, although A remains used in i3, and set and used
> in i2? I don't see how.
>
> I see that combine would record the value for reg A at i2 in this case,
> but would it express it in terms of which earlier value of reg A?
> Shouldn't we have reset it while placing notes in this case too?
>
> > It wasn't obvious to me (this code is horribly complicated). Whether
> > all existing code is correct... it's probably best not to look too
> > closely :-/
>
> You're right about its being horribly complicated.
>
> Maybe I should go about it incrementally.
>
> > If you have a patch you feel confident in, could you post it again
> > please?
>
> So let me tell you how I feel about this. It has waited long enough,
> and there are at least 3 bugs known to be fixed by the first very simple
> patch below. The catch is that it doesn't adjust REG_N_SETS at all (we
> didn't before the patch, and that didn't seem to hurt too much). I've
> regstrapped this successfully on x86_64-linux-gnu and i686-linux-gnu.
>
> --->cut<---
> When combine drops a REG_UNUSED SET in a parallel, we have to clear
> cached values, so that, even if the REGs remain used (e.g. because
> they were referenced in the used SET_SRC), we will not use properties
> of the dropped modified value as if they applied to the preserved
> original one.
>
> We fail to adjust REG_N_SETS.
>
> for gcc/ChangeLog
>
> PR rtl-optimization/80693
> PR rtl-optimization/81019
> PR rtl-optimization/81020
> * combine.c (distribute_notes): Reset any REG_UNUSED REGs that
> are not mentioned in i3. Place the REG_UNUSED note on i2,
> possibly modified to REG_DEAD, if it did not originate in i3.
>
> for gcc/testsuite/ChangeLog
>
> PR rtl-optimization/80693
> PR rtl-optimization/81019
> PR rtl-optimization/81020
> * gcc.dg/pr80693.c: New.
> * gcc.dg/pr81019.c: New.
> ---
> gcc/combine.c | 40
> ++++++++++++++++++++++++++++++++++++++++
> gcc/testsuite/gcc.dg/pr80693.c | 26 ++++++++++++++++++++++++++
> gcc/testsuite/gcc.dg/pr81019.c | 27 +++++++++++++++++++++++++++
> 3 files changed, 93 insertions(+)
> create mode 100644 gcc/testsuite/gcc.dg/pr80693.c
> create mode 100644 gcc/testsuite/gcc.dg/pr81019.c
>
> diff --git a/gcc/combine.c b/gcc/combine.c
> index 863abd7a282b..976b5f2e519b 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -14254,6 +14254,46 @@ distribute_notes (rtx notes, rtx_insn
> *from_insn, rtx_insn *i3, rtx_insn *i2,
> PUT_REG_NOTE_KIND (note, REG_DEAD);
> place = i3;
> }
> +
> + /* A SET or CLOBBER of the REG_UNUSED reg has been removed,
> + but we can't tell which at this point. We must reset any
> + expectations we had about the value that was previously
> + stored in the reg. ??? Ideally, we'd adjust REG_N_SETS
> + and, if appropriate, restore its previous value, but we
> + don't have enough information for that at this point. */
> + else
> + {
> + record_value_for_reg (XEXP (note, 0), NULL, NULL_RTX);
> +
> + /* Otherwise, if this register is now referenced in i2
> + then the register used to be modified in one of the
> + original insns. If it was i3 (say, in an unused
> + parallel), it's now completely gone, so the note can
> + be discarded. But if it was modified in i2, i1 or i0
> + and we still reference it in i2, then we're
> + referencing the previous value, and since the
> + register was modified and REG_UNUSED, we know that
> + the previous value is now dead. So, if we only
> + reference the register in i2, we change the note to
> + REG_DEAD, to reflect the previous value. However, if
> + we're also setting or clobbering the register as
> + scratch, we know (because the register was not
> + referenced in i3) that it's unused, just as it was
> + unused before, and we place the note in i2. */
> + if (from_insn != i3 && i2 && INSN_P (i2)
> + && reg_referenced_p (XEXP (note, 0), PATTERN (i2)))
> + {
> + if (!reg_set_p (XEXP (note, 0), PATTERN (i2)))
> + PUT_REG_NOTE_KIND (note, REG_DEAD);
> + if (! (REG_P (XEXP (note, 0))
> + ? find_regno_note (i2, REG_NOTE_KIND (note),
> + REGNO (XEXP (note, 0)))
> + : find_reg_note (i2, REG_NOTE_KIND (note),
> + XEXP (note, 0))))
> + place = i2;
> + }
> + }
> +
> break;
>
> case REG_EQUAL:
> diff --git a/gcc/testsuite/gcc.dg/pr80693.c
> b/gcc/testsuite/gcc.dg/pr80693.c
> new file mode 100644
> index 000000000000..507177167e58
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr80693.c
> @@ -0,0 +1,26 @@
> +/* { dg-do run } */
> +/* { dg-options "-O -fno-tree-coalesce-vars" } */
> +typedef unsigned char u8;
> +typedef unsigned short u16;
> +typedef unsigned u32;
> +typedef unsigned long long u64;
> +
> +static u64 __attribute__((noinline, noclone))
> +foo(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u16 u16_1)
> +{
> + u16_1 += 0x1051;
> + u16_1 &= 1;
> + u8_0 <<= u32_0 & 7;
> + u16_0 -= !u16_1;
> + u16_1 >>= ((u16)-u8_0 != 0xff);
> + return u8_0 + u16_0 + u64_0 + u16_1;
> +}
> +
> +int
> +main (void)
> +{
> + u64 x = foo(1, 1, 0xffff, 0, 1);
> + if (x != 0x80)
> + __builtin_abort();
> + return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/pr81019.c
> b/gcc/testsuite/gcc.dg/pr81019.c
> new file mode 100644
> index 000000000000..cf13bfa92758
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr81019.c
> @@ -0,0 +1,27 @@
> +/* PR rtl-optimization/81019 */
> +/* { dg-do run } */
> +/* { dg-options "-O -fno-tree-ccp" } */
> +
> +unsigned long long __attribute__((noinline, noclone))
> +foo (unsigned char a, unsigned short b, unsigned c, unsigned long long d,
> + unsigned char e, unsigned short f, unsigned g, unsigned long long h)
> +{
> + g = e;
> + c &= 0 < d;
> + b *= d;
> + g ^= -1;
> + g &= 1;
> + c |= 1;
> + a -= 0 < g;
> + g >>= 1;
> + f = b | (f >> b);
> + return a + c + d + f + g + h;
> +}
> +
> +int
> +main (void)
> +{
> + if (foo (0, 0, 0, 0, 0, 0, 0, 0) != 0x100)
> + __builtin_abort ();
> + return 0;
> +}
> --->cut<---
>
This patch fixes the gcc.dg/pr81020.c execution failure I've been seeing
on aarch64.
Thanks!
Kyrill
>
> But then it occurred to me that the REG_UNUSED register from i1 might
> have *also* been set in i2, and we might then have used it as a scratch
> register for split, while the set that the REG_UNUSED pertained to might
> have been completely discarded from a parallel in i1 or i0. And then I
> wasn't unsure whether or not to decrement REG_N_SETS.
>
> diff --git a/gcc/combine.c b/gcc/combine.c
> index 863abd7a282b..277492af9108 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -14254,6 +14254,59 @@ distribute_notes (rtx notes, rtx_insn
> *from_insn, rtx_insn *i3, rtx_insn *i2,
> PUT_REG_NOTE_KIND (note, REG_DEAD);
> place = i3;
> }
> +
> + /* A SET or CLOBBER of the REG_UNUSED reg has been removed,
> + so we we must reset any expectations we had about the
> + value that was previously stored in the reg, and adjust
> + REG_N_SETS. Ideally, we'd restore the register to its
> + previous value, but we don't have enough information for
> + that at this point. */
> + else
> + {
> + record_value_for_reg (XEXP (note, 0), NULL, NULL_RTX);
> +
> + bool set_p = false;
> +
> + /* Otherwise, if this register is now referenced in i2
> + then the register used to be modified in one of the
> + original insns. If it was i3 (say, in an unused
> + parallel), it's now completely gone, so the note can
> + be discarded. But if it was modified in i2, i1 or i0
> + and we still reference it in i2, then we're
> + referencing the previous value, and since the
> + register was modified and REG_UNUSED, we know that
> + the previous value is now dead. So, if we only
> + reference the register in i2, we change the note to
> + REG_DEAD, to reflect the previous value. However, if
> + we're also setting or clobbering the register as
> + scratch, we know (because the register was not
> + referenced in i3) that it's unused, just as it was
> + unused before, and we place the note in i2. */
> + if (from_insn != i3 && i2 && INSN_P (i2))
> + {
> + set_p = reg_set_p (XEXP (note, 0), PATTERN (i2));
> +
> + if (set_p
> + || reg_referenced_p (XEXP (note, 0), PATTERN (i2)))
> + {
> + if (!set_p)
> + PUT_REG_NOTE_KIND (note, REG_DEAD);
> +
> + if (! (REG_P (XEXP (note, 0))
> + ? find_regno_note (i2, REG_NOTE_KIND (note),
> + REGNO (XEXP (note, 0)))
> + : find_reg_note (i2, REG_NOTE_KIND (note),
> + XEXP (note, 0))))
> + place = i2;
> + }
> + }
> +
> + /* If we no longer have a set, drop the set count. */
> + if (!set_p && REG_P (XEXP (note, 0))
> + && REGNO (XEXP (note, 0)) < reg_n_sets_max)
> + INC_REG_N_SETS (REGNO (XEXP (note, 0)), -1);
> + }
> +
> break;
>
> case REG_EQUAL:
>
>
> Then I realized we might have to also reset the cached value when it's
> referenced in i3, and possibly also adjust the counts. And also some of
> that when it's set in i3. Then I lost it. Help? :-)
>
> FWIW, I'd be glad just installing the patch between --->cut<---s, or
> even a simpler version thereof that just resets the recorded value and
> doesn't even attempt to place notes in i2, to get the bugs fixed while
> we sort out the really tricky issues of adjusting REG_N_SETS. The
> (incomplete, but functional) fix has been known for a while, and we
> shouldn't subject users to wrong code when we can help it, even if we
> might miss optimization opportunities for that, right? Thoughts?
>
> --
> Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/
> <http://FSFLA.org/%7Elxoliva/>
> You must be the change you wish to see in the world. -- Gandhi
> Be Free! -- http://FSFLA.org/ FSF Latin America board member
> Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PR80693] drop value of parallel SETs dropped by combine
2017-12-07 21:01 ` Alexandre Oliva
2017-12-08 10:56 ` Kyrill Tkachov
@ 2017-12-08 13:55 ` Segher Boessenkool
1 sibling, 0 replies; 13+ messages in thread
From: Segher Boessenkool @ 2017-12-08 13:55 UTC (permalink / raw)
To: Alexandre Oliva; +Cc: gcc-patches
Hi!
On Thu, Dec 07, 2017 at 07:01:03PM -0200, Alexandre Oliva wrote:
[ Big snip; thanks for all the detailed information! ]
> Still, I'm concerned I haven't caught all of the cases in which
> adjusting REG_N_SETS would be needed: we might have dropped multiple
> SETs of the same pseudo, each with its own REG_UNUSED note (say, from
> all of i3, i2, i1, and i0), and the current logic will only decrement
> REG_N_SETS once, and only if i2 no longer sets the register.
The only thing that matters for combine is that if REG_N_SETS == 1, then
it must be correct. So "forgetting" to decrement is okay, forgetting
to increment is not.
> > It wasn't obvious to me (this code is horribly complicated). Whether
> > all existing code is correct... it's probably best not to look too
> > closely :-/
>
> You're right about its being horribly complicated.
I would like to remove all regstat things from combine, use "real" DF
instead. This will simplify many things, and allow better optimisation
as well. But is it fast enough? We'll see.
> > If you have a patch you feel confident in, could you post it again
> > please?
>
> So let me tell you how I feel about this. It has waited long enough,
> and there are at least 3 bugs known to be fixed by the first very simple
> patch below. The catch is that it doesn't adjust REG_N_SETS at all (we
> didn't before the patch, and that didn't seem to hurt too much). I've
> regstrapped this successfully on x86_64-linux-gnu and i686-linux-gnu.
> PR rtl-optimization/80693
> PR rtl-optimization/81019
> PR rtl-optimization/81020
> * combine.c (distribute_notes): Reset any REG_UNUSED REGs that
> are not mentioned in i3. Place the REG_UNUSED note on i2,
> possibly modified to REG_DEAD, if it did not originate in i3.
>
> for gcc/testsuite/ChangeLog
>
> PR rtl-optimization/80693
> PR rtl-optimization/81019
> PR rtl-optimization/81020
> * gcc.dg/pr80693.c: New.
> * gcc.dg/pr81019.c: New.
> But then it occurred to me that the REG_UNUSED register from i1 might
> have *also* been set in i2, and we might then have used it as a scratch
> register for split, while the set that the REG_UNUSED pertained to might
> have been completely discarded from a parallel in i1 or i0. And then I
> wasn't unsure whether or not to decrement REG_N_SETS.
[ more snip ]
> Then I realized we might have to also reset the cached value when it's
> referenced in i3, and possibly also adjust the counts. And also some of
> that when it's set in i3. Then I lost it. Help? :-)
>
> FWIW, I'd be glad just installing the patch between --->cut<---s, or
> even a simpler version thereof that just resets the recorded value and
> doesn't even attempt to place notes in i2, to get the bugs fixed while
> we sort out the really tricky issues of adjusting REG_N_SETS. The
> (incomplete, but functional) fix has been known for a while, and we
> shouldn't subject users to wrong code when we can help it, even if we
> might miss optimization opportunities for that, right? Thoughts?
Yes, that first patch is okay for trunk. Thanks for all the work on this!
I don't think this patch makes anything worse, and it does make some things
better.
Segher
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-12-08 13:55 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-18 10:34 [PR80693] drop value of parallel SETs dropped by combine Alexandre Oliva
2017-06-07 2:14 ` Alexandre Oliva
2017-06-08 19:50 ` Segher Boessenkool
2017-06-22 6:21 ` Alexandre Oliva
2017-06-22 16:04 ` Segher Boessenkool
2017-06-24 2:01 ` Alexandre Oliva
2017-07-07 12:06 ` Segher Boessenkool
2017-12-07 21:01 ` Alexandre Oliva
2017-12-08 10:56 ` Kyrill Tkachov
2017-12-08 13:55 ` Segher Boessenkool
2017-06-22 12:26 ` Alexandre Oliva
2017-06-22 13:14 ` Alexandre Oliva
2017-06-22 16:44 ` Segher Boessenkool
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).