* Fix PR rtl-optimization/84071
@ 2018-01-31 10:30 Eric Botcazou
2018-01-31 15:34 ` Eric Botcazou
0 siblings, 1 reply; 7+ messages in thread
From: Eric Botcazou @ 2018-01-31 10:30 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 2668 bytes --]
This is a regression present on mainline and 7 branch for big-endian ARM in
the form of a wrong elimination of zero-extension after sign-extended load.
The problem is that reg_nonzero_bits_for_combine returns 0xffff for the
register (reg:HI 121) when queried for SImode after:
(insn 10 7 11 2 (set (subreg:SI (reg:HI 121) 0)
(sign_extend:SI (mem/c:HI (plus:SI (reg/f:SI 103 afp)
(const_int 4 [0x4])) [1 u+0 S2 A32]))) 171
{*arm_extendhisi2_v6}
That's a valid query since the combiner keeps track of nonzero_bits in the
largest possible mode:
/* Mode used to compute significance in reg_stat[].nonzero_bits. It is the
largest integer mode that can fit in HOST_BITS_PER_WIDE_INT. */
static machine_mode nonzero_bits_mode;
The root cause of the issue is the SUBREG case of record_dead_and_set_regs_1:
if (REG_P (dest))
{
/* If we are setting the whole register, we know its value. Otherwise
show that we don't know the value. We can handle SUBREG in
some cases. */
if (GET_CODE (setter) == SET && dest == SET_DEST (setter))
record_value_for_reg (dest, record_dead_insn, SET_SRC (setter));
else if (GET_CODE (setter) == SET
&& GET_CODE (SET_DEST (setter)) == SUBREG
&& SUBREG_REG (SET_DEST (setter)) == dest
&& known_le (GET_MODE_PRECISION (GET_MODE (dest)), BITS_PER_WORD)
&& subreg_lowpart_p (SET_DEST (setter)))
record_value_for_reg (dest, record_dead_insn,
gen_lowpart (GET_MODE (dest),
SET_SRC (setter)));
else
record_value_for_reg (dest, record_dead_insn, NULL_RTX);
}
It's very old (r357) and effectively synthesizes a wrong SET operation here
because it strips the sign-extension around the MEM; now rtlanal.c considers
that loads are extended according to LOAD_EXTEND_OP on RISC platforms and ARM
is a ZERO_EXTEND target in this configuration (ARMv4 and above), so the sign-
extension is effectively turned into a zero-extension.
This SUBREG case looks quite questionable, not only for paradoxical SUBREGs on
RISC platforms, but also for regular SUBREGs, for which it's trying to infer
information on the whole register from a SET of the lowpart. But the fix only
changes the first, problematic case.
Tested on SPARC64/Linux and ARM/EABI, applied on mainline and 7 branch.
2018-01-31 Eric Botcazou <ebotcazou@adacore.com>
PR rtl-optimization/84071
* combine.c (record_dead_and_set_regs_1): Record the source unmodified
for a paradoxical SUBREG on a WORD_REGISTER_OPERATIONS target.
2018-01-31 Eric Botcazou <ebotcazou@adacore.com>
* gcc.c-torture/execute/20180131-1.c: New test.
--
Eric Botcazou
[-- Attachment #2: pr84071.diff --]
[-- Type: text/x-patch, Size: 1624 bytes --]
Index: combine.c
===================================================================
--- combine.c (revision 257139)
+++ combine.c (working copy)
@@ -13245,18 +13245,25 @@ record_dead_and_set_regs_1 (rtx dest, co
if (REG_P (dest))
{
/* If we are setting the whole register, we know its value. Otherwise
- show that we don't know the value. We can handle SUBREG in
- some cases. */
+ show that we don't know the value. We can handle a SUBREG if it's
+ the low part, but we must be careful with paradoxical SUBREGs on
+ RISC architectures because we cannot strip e.g. an extension around
+ a load and record the naked load since the RTL middle-end considers
+ that the upper bits are defined according to LOAD_EXTEND_OP. */
if (GET_CODE (setter) == SET && dest == SET_DEST (setter))
record_value_for_reg (dest, record_dead_insn, SET_SRC (setter));
else if (GET_CODE (setter) == SET
&& GET_CODE (SET_DEST (setter)) == SUBREG
&& SUBREG_REG (SET_DEST (setter)) == dest
- && known_le (GET_MODE_PRECISION (GET_MODE (dest)), BITS_PER_WORD)
+ && known_le (GET_MODE_PRECISION (GET_MODE (dest)),
+ BITS_PER_WORD)
&& subreg_lowpart_p (SET_DEST (setter)))
record_value_for_reg (dest, record_dead_insn,
- gen_lowpart (GET_MODE (dest),
- SET_SRC (setter)));
+ WORD_REGISTER_OPERATIONS
+ && paradoxical_subreg_p (SET_DEST (setter))
+ ? SET_SRC (setter)
+ : gen_lowpart (GET_MODE (dest),
+ SET_SRC (setter)));
else
record_value_for_reg (dest, record_dead_insn, NULL_RTX);
}
[-- Attachment #3: 20180131-1.c --]
[-- Type: text/x-csrc, Size: 449 bytes --]
/* PR rtl-optimization/84071 */
/* Reported by Wilco <wilco@gcc.gnu.org> */
extern void abort (void);
typedef union
{
signed short ss;
unsigned short us;
int x;
} U;
int f(int x, int y, int z, int a, U u) __attribute__((noclone, noinline));
int f(int x, int y, int z, int a, U u)
{
return (u.ss <= 0) + u.us;
}
int main (void)
{
U u = { .ss = -1 };
if (f (0, 0, 0, 0, u) != (1 << sizeof (short) * 8))
abort ();
return 0;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fix PR rtl-optimization/84071
2018-01-31 10:30 Fix PR rtl-optimization/84071 Eric Botcazou
@ 2018-01-31 15:34 ` Eric Botcazou
2018-01-31 19:17 ` Richard Sandiford
0 siblings, 1 reply; 7+ messages in thread
From: Eric Botcazou @ 2018-01-31 15:34 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 433 bytes --]
> Tested on SPARC64/Linux and ARM/EABI, applied on mainline and 7 branch.
As discussed in the audit trail, this beefs up the internal documentation
about WORD_REGISTER_OPERATIONS.
Tested with 'make doc', applied on mainline and 7 branch.
2018-01-31 Eric Botcazou <ebotcazou@adacore.com>
PR rtl-optimization/84071
* doc/tm.texi.in (WORD_REGISTER_OPERATIONS): Add explicit case.
* doc/tm.texi: Regenerate.
--
Eric Botcazou
[-- Attachment #2: p.diff --]
[-- Type: text/x-patch, Size: 946 bytes --]
Index: doc/tm.texi.in
===================================================================
--- doc/tm.texi.in (revision 257227)
+++ doc/tm.texi.in (working copy)
@@ -7376,8 +7376,12 @@ is in effect.
@defmac WORD_REGISTER_OPERATIONS
Define this macro to 1 if operations between registers with integral mode
-smaller than a word are always performed on the entire register.
-Most RISC machines have this property and most CISC machines do not.
+smaller than a word are always performed on the entire register. To be
+more explicit, if you start with a pair of @code{word_mode} registers with
+known values and you do a subword, for example @code{QImode}, addition on
+the low part of the registers, then the compiler may consider that the
+result has a known value in @code{word_mode} too if the macro is defined
+to 1. Most RISC machines have this property and most CISC machines do not.
@end defmac
@hook TARGET_MIN_ARITHMETIC_PRECISION
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fix PR rtl-optimization/84071
2018-01-31 15:34 ` Eric Botcazou
@ 2018-01-31 19:17 ` Richard Sandiford
2018-02-01 10:14 ` Richard Earnshaw (lists)
2018-02-02 12:06 ` Eric Botcazou
0 siblings, 2 replies; 7+ messages in thread
From: Richard Sandiford @ 2018-01-31 19:17 UTC (permalink / raw)
To: Eric Botcazou; +Cc: gcc-patches
Eric Botcazou <ebotcazou@adacore.com> writes:
>> Tested on SPARC64/Linux and ARM/EABI, applied on mainline and 7 branch.
>
> As discussed in the audit trail, this beefs up the internal documentation
> about WORD_REGISTER_OPERATIONS.
>
> Tested with 'make doc', applied on mainline and 7 branch.
>
>
> 2018-01-31 Eric Botcazou <ebotcazou@adacore.com>
>
> PR rtl-optimization/84071
> * doc/tm.texi.in (WORD_REGISTER_OPERATIONS): Add explicit case.
> * doc/tm.texi: Regenerate.
>
> Index: doc/tm.texi.in
> ===================================================================
> --- doc/tm.texi.in (revision 257227)
> +++ doc/tm.texi.in (working copy)
> @@ -7376,8 +7376,12 @@ is in effect.
>
> @defmac WORD_REGISTER_OPERATIONS
> Define this macro to 1 if operations between registers with integral mode
> -smaller than a word are always performed on the entire register.
> -Most RISC machines have this property and most CISC machines do not.
> +smaller than a word are always performed on the entire register. To be
> +more explicit, if you start with a pair of @code{word_mode} registers with
> +known values and you do a subword, for example @code{QImode}, addition on
> +the low part of the registers, then the compiler may consider that the
> +result has a known value in @code{word_mode} too if the macro is defined
> +to 1. Most RISC machines have this property and most CISC machines do not.
By QImode addition, do you mean:
(set (subreg:QI (reg:SI X1) N)
(plus:QI (subreg:QI (reg:SI X2) N)
(subreg:QI (reg:SI X3) N)))
? I thought the point was instead that the target expected such ops
to be done on word_mode, even if the values involved are naturally QImode:
(set (subreg:SI (reg:QI Y1) 0)
(plus:SI (subreg:SI (reg:QI Y2) 0)
(subreg:SI (reg:QI Y3) 0)))
Most RISC/WORD_REGISTER_OPERATIONS targets wouldn't provide QImode
addition patterns, so the first insn seems unlikely.
Thanks,
Richard
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fix PR rtl-optimization/84071
2018-01-31 19:17 ` Richard Sandiford
@ 2018-02-01 10:14 ` Richard Earnshaw (lists)
2018-02-02 12:21 ` Eric Botcazou
2018-02-02 12:06 ` Eric Botcazou
1 sibling, 1 reply; 7+ messages in thread
From: Richard Earnshaw (lists) @ 2018-02-01 10:14 UTC (permalink / raw)
To: Eric Botcazou, gcc-patches, richard.sandiford
On 31/01/18 19:04, Richard Sandiford wrote:
> Eric Botcazou <ebotcazou@adacore.com> writes:
>>> Tested on SPARC64/Linux and ARM/EABI, applied on mainline and 7 branch.
>>
>> As discussed in the audit trail, this beefs up the internal documentation
>> about WORD_REGISTER_OPERATIONS.
>>
>> Tested with 'make doc', applied on mainline and 7 branch.
>>
>>
>> 2018-01-31 Eric Botcazou <ebotcazou@adacore.com>
>>
>> PR rtl-optimization/84071
>> * doc/tm.texi.in (WORD_REGISTER_OPERATIONS): Add explicit case.
>> * doc/tm.texi: Regenerate.
>>
>> Index: doc/tm.texi.in
>> ===================================================================
>> --- doc/tm.texi.in (revision 257227)
>> +++ doc/tm.texi.in (working copy)
>> @@ -7376,8 +7376,12 @@ is in effect.
>>
>> @defmac WORD_REGISTER_OPERATIONS
>> Define this macro to 1 if operations between registers with integral mode
>> -smaller than a word are always performed on the entire register.
>> -Most RISC machines have this property and most CISC machines do not.
>> +smaller than a word are always performed on the entire register. To be
>> +more explicit, if you start with a pair of @code{word_mode} registers with
>> +known values and you do a subword, for example @code{QImode}, addition on
>> +the low part of the registers, then the compiler may consider that the
>> +result has a known value in @code{word_mode} too if the macro is defined
>> +to 1. Most RISC machines have this property and most CISC machines do not.
>
> By QImode addition, do you mean:
>
> (set (subreg:QI (reg:SI X1) N)
> (plus:QI (subreg:QI (reg:SI X2) N)
> (subreg:QI (reg:SI X3) N)))
>
> ? I thought the point was instead that the target expected such ops
> to be done on word_mode, even if the values involved are naturally QImode:
>
> (set (subreg:SI (reg:QI Y1) 0)
> (plus:SI (subreg:SI (reg:QI Y2) 0)
> (subreg:SI (reg:QI Y3) 0)))
>
> Most RISC/WORD_REGISTER_OPERATIONS targets wouldn't provide QImode
> addition patterns, so the first insn seems unlikely.
>
> Thanks,
> Richard
>
That's always been my interpretation too. Seems like we may be changing
the meaning of this macro...
R.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fix PR rtl-optimization/84071
2018-01-31 19:17 ` Richard Sandiford
2018-02-01 10:14 ` Richard Earnshaw (lists)
@ 2018-02-02 12:06 ` Eric Botcazou
1 sibling, 0 replies; 7+ messages in thread
From: Eric Botcazou @ 2018-02-02 12:06 UTC (permalink / raw)
To: Richard Sandiford; +Cc: gcc-patches
> By QImode addition, do you mean:
>
> (set (subreg:QI (reg:SI X1) N)
> (plus:QI (subreg:QI (reg:SI X2) N)
> (subreg:QI (reg:SI X3) N)))
>
> ?
Yes.
> I thought the point was instead that the target expected such ops
> to be done on word_mode, even if the values involved are naturally QImode:
>
> (set (subreg:SI (reg:QI Y1) 0)
> (plus:SI (subreg:SI (reg:QI Y2) 0)
> (subreg:SI (reg:QI Y3) 0)))
But that's not what the implementation of WORD_REGISTER_OPERATIONS does; IOW
you'll get the above from expand_binop without the macro if there is no addqi.
> Most RISC/WORD_REGISTER_OPERATIONS targets wouldn't provide QImode
> addition patterns, so the first insn seems unlikely.
Indeed, SImode on 64-bit RISC targets should have been more realistic, but I
didn't want to give a 64-bit example.
--
Eric Botcazou
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fix PR rtl-optimization/84071
2018-02-01 10:14 ` Richard Earnshaw (lists)
@ 2018-02-02 12:21 ` Eric Botcazou
2018-02-02 13:46 ` Richard Earnshaw (lists)
0 siblings, 1 reply; 7+ messages in thread
From: Eric Botcazou @ 2018-02-02 12:21 UTC (permalink / raw)
To: Richard Earnshaw (lists); +Cc: gcc-patches, richard.sandiford
> That's always been my interpretation too. Seems like we may be changing
> the meaning of this macro...
The main (and essentially only) effect of WORD_REGISTER_OPERATIONS in the
compiler happens during combine and is explained by this comment taken from
eliminate_regs_1 and written by Jim in 1998:
#ifdef WORD_REGISTER_OPERATIONS
/* On these machines, combine can create rtl of the form
(set (subreg:m1 (reg:m2 R) 0) ...)
where m1 < m2, and expects something interesting to
happen to the entire word. Moreover, it will use the
(reg:m2 R) later, expecting all bits to be preserved.
So if the number of words is the same, preserve the
subreg so that push_reloads can see it. */
&& ! ((x_size-1)/UNITS_PER_WORD == (new_size-1) UNITS_PER_WORD)
#endif
--
Eric Botcazou
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fix PR rtl-optimization/84071
2018-02-02 12:21 ` Eric Botcazou
@ 2018-02-02 13:46 ` Richard Earnshaw (lists)
0 siblings, 0 replies; 7+ messages in thread
From: Richard Earnshaw (lists) @ 2018-02-02 13:46 UTC (permalink / raw)
To: Eric Botcazou; +Cc: gcc-patches, richard.sandiford
On 02/02/18 12:21, Eric Botcazou wrote:
>> That's always been my interpretation too. Seems like we may be changing
>> the meaning of this macro...
>
> The main (and essentially only) effect of WORD_REGISTER_OPERATIONS in the
> compiler happens during combine and is explained by this comment taken from
> eliminate_regs_1 and written by Jim in 1998:
>
> #ifdef WORD_REGISTER_OPERATIONS
> /* On these machines, combine can create rtl of the form
> (set (subreg:m1 (reg:m2 R) 0) ...)
> where m1 < m2, and expects something interesting to
> happen to the entire word. Moreover, it will use the
> (reg:m2 R) later, expecting all bits to be preserved.
> So if the number of words is the same, preserve the
> subreg so that push_reloads can see it. */
> && ! ((x_size-1)/UNITS_PER_WORD == (new_size-1) UNITS_PER_WORD)
> #endif
>
interesting. So I guess it all comes down to what 'something
interesting' means and whether that has to be consistent for all modes.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-02-02 13:46 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-31 10:30 Fix PR rtl-optimization/84071 Eric Botcazou
2018-01-31 15:34 ` Eric Botcazou
2018-01-31 19:17 ` Richard Sandiford
2018-02-01 10:14 ` Richard Earnshaw (lists)
2018-02-02 12:21 ` Eric Botcazou
2018-02-02 13:46 ` Richard Earnshaw (lists)
2018-02-02 12:06 ` Eric Botcazou
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).