public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Fix PR rtl-optimization/84071
@ 2018-01-31 10:30 Eric Botcazou
  2018-01-31 15:34 ` Eric Botcazou
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Botcazou @ 2018-01-31 10:30 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2668 bytes --]

This is a regression present on mainline and 7 branch for big-endian ARM in 
the form of a wrong elimination of zero-extension after sign-extended load.

The problem is that reg_nonzero_bits_for_combine returns 0xffff for the 
register (reg:HI 121) when queried for SImode after:

(insn 10 7 11 2 (set (subreg:SI (reg:HI 121) 0)
        (sign_extend:SI (mem/c:HI (plus:SI (reg/f:SI 103 afp)
                    (const_int 4 [0x4])) [1 u+0 S2 A32]))) 171 
{*arm_extendhisi2_v6}

That's a valid query since the combiner keeps track of nonzero_bits in the 
largest possible mode:

/* Mode used to compute significance in reg_stat[].nonzero_bits.  It is the
   largest integer mode that can fit in HOST_BITS_PER_WIDE_INT.  */

static machine_mode nonzero_bits_mode;

The root cause of the issue is the SUBREG case of record_dead_and_set_regs_1:

  if (REG_P (dest))
    {
      /* If we are setting the whole register, we know its value.  Otherwise
	 show that we don't know the value.  We can handle SUBREG in
	 some cases.  */
      if (GET_CODE (setter) == SET && dest == SET_DEST (setter))
	record_value_for_reg (dest, record_dead_insn, SET_SRC (setter));
      else if (GET_CODE (setter) == SET
	       && GET_CODE (SET_DEST (setter)) == SUBREG
	       && SUBREG_REG (SET_DEST (setter)) == dest
	       && known_le (GET_MODE_PRECISION (GET_MODE (dest)), BITS_PER_WORD)
	       && subreg_lowpart_p (SET_DEST (setter)))
	record_value_for_reg (dest, record_dead_insn,
			      gen_lowpart (GET_MODE (dest),
						       SET_SRC (setter)));
      else
	record_value_for_reg (dest, record_dead_insn, NULL_RTX);
    }

It's very old (r357) and effectively synthesizes a wrong SET operation here 
because it strips the sign-extension around the MEM; now rtlanal.c considers 
that loads are extended according to LOAD_EXTEND_OP on RISC platforms and ARM 
is a ZERO_EXTEND target in this configuration (ARMv4 and above), so the sign-
extension is effectively turned into a zero-extension.

This SUBREG case looks quite questionable, not only for paradoxical SUBREGs on 
RISC platforms, but also for regular SUBREGs, for which it's trying to infer 
information on the whole register from a SET of the lowpart.  But the fix only 
changes the first, problematic case.

Tested on SPARC64/Linux and ARM/EABI, applied on mainline and 7 branch.


2018-01-31  Eric Botcazou  <ebotcazou@adacore.com>

	PR rtl-optimization/84071
	* combine.c (record_dead_and_set_regs_1): Record the source unmodified
	for a paradoxical SUBREG on a WORD_REGISTER_OPERATIONS target.


2018-01-31  Eric Botcazou  <ebotcazou@adacore.com>

	* gcc.c-torture/execute/20180131-1.c: New test.

-- 
Eric Botcazou

[-- Attachment #2: pr84071.diff --]
[-- Type: text/x-patch, Size: 1624 bytes --]

Index: combine.c
===================================================================
--- combine.c	(revision 257139)
+++ combine.c	(working copy)
@@ -13245,18 +13245,25 @@ record_dead_and_set_regs_1 (rtx dest, co
   if (REG_P (dest))
     {
       /* If we are setting the whole register, we know its value.  Otherwise
-	 show that we don't know the value.  We can handle SUBREG in
-	 some cases.  */
+	 show that we don't know the value.  We can handle a SUBREG if it's
+	 the low part, but we must be careful with paradoxical SUBREGs on
+	 RISC architectures because we cannot strip e.g. an extension around
+	 a load and record the naked load since the RTL middle-end considers
+	 that the upper bits are defined according to LOAD_EXTEND_OP.  */
       if (GET_CODE (setter) == SET && dest == SET_DEST (setter))
 	record_value_for_reg (dest, record_dead_insn, SET_SRC (setter));
       else if (GET_CODE (setter) == SET
 	       && GET_CODE (SET_DEST (setter)) == SUBREG
 	       && SUBREG_REG (SET_DEST (setter)) == dest
-	       && known_le (GET_MODE_PRECISION (GET_MODE (dest)), BITS_PER_WORD)
+	       && known_le (GET_MODE_PRECISION (GET_MODE (dest)),
+			    BITS_PER_WORD)
 	       && subreg_lowpart_p (SET_DEST (setter)))
 	record_value_for_reg (dest, record_dead_insn,
-			      gen_lowpart (GET_MODE (dest),
-						       SET_SRC (setter)));
+			      WORD_REGISTER_OPERATIONS
+			      && paradoxical_subreg_p (SET_DEST (setter))
+			      ? SET_SRC (setter)
+			      : gen_lowpart (GET_MODE (dest),
+					     SET_SRC (setter)));
       else
 	record_value_for_reg (dest, record_dead_insn, NULL_RTX);
     }

[-- Attachment #3: 20180131-1.c --]
[-- Type: text/x-csrc, Size: 449 bytes --]

/* PR rtl-optimization/84071 */
/* Reported by Wilco <wilco@gcc.gnu.org> */

extern void abort (void);

typedef union 
{
  signed short ss;
  unsigned short us;
  int x;
} U;

int f(int x, int y, int z, int a, U u) __attribute__((noclone, noinline));

int f(int x, int y, int z, int a, U u)
{
  return (u.ss <= 0) + u.us;
}

int main (void)
{
  U u = { .ss = -1 };

  if (f (0, 0, 0, 0, u) != (1 << sizeof (short) * 8))
    abort ();

  return 0;
}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fix PR rtl-optimization/84071
  2018-01-31 10:30 Fix PR rtl-optimization/84071 Eric Botcazou
@ 2018-01-31 15:34 ` Eric Botcazou
  2018-01-31 19:17   ` Richard Sandiford
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Botcazou @ 2018-01-31 15:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 433 bytes --]

> Tested on SPARC64/Linux and ARM/EABI, applied on mainline and 7 branch.

As discussed in the audit trail, this beefs up the internal documentation 
about WORD_REGISTER_OPERATIONS.

Tested with 'make doc', applied on mainline and 7 branch.


2018-01-31  Eric Botcazou  <ebotcazou@adacore.com>

	PR rtl-optimization/84071
	* doc/tm.texi.in (WORD_REGISTER_OPERATIONS): Add explicit case.
	* doc/tm.texi: Regenerate.

-- 
Eric Botcazou

[-- Attachment #2: p.diff --]
[-- Type: text/x-patch, Size: 946 bytes --]

Index: doc/tm.texi.in
===================================================================
--- doc/tm.texi.in	(revision 257227)
+++ doc/tm.texi.in	(working copy)
@@ -7376,8 +7376,12 @@ is in effect.
 
 @defmac WORD_REGISTER_OPERATIONS
 Define this macro to 1 if operations between registers with integral mode
-smaller than a word are always performed on the entire register.
-Most RISC machines have this property and most CISC machines do not.
+smaller than a word are always performed on the entire register.  To be
+more explicit, if you start with a pair of @code{word_mode} registers with
+known values and you do a subword, for example @code{QImode}, addition on
+the low part of the registers, then the compiler may consider that the
+result has a known value in @code{word_mode} too if the macro is defined
+to 1.  Most RISC machines have this property and most CISC machines do not.
 @end defmac
 
 @hook TARGET_MIN_ARITHMETIC_PRECISION

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fix PR rtl-optimization/84071
  2018-01-31 15:34 ` Eric Botcazou
@ 2018-01-31 19:17   ` Richard Sandiford
  2018-02-01 10:14     ` Richard Earnshaw (lists)
  2018-02-02 12:06     ` Eric Botcazou
  0 siblings, 2 replies; 7+ messages in thread
From: Richard Sandiford @ 2018-01-31 19:17 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches

Eric Botcazou <ebotcazou@adacore.com> writes:
>> Tested on SPARC64/Linux and ARM/EABI, applied on mainline and 7 branch.
>
> As discussed in the audit trail, this beefs up the internal documentation 
> about WORD_REGISTER_OPERATIONS.
>
> Tested with 'make doc', applied on mainline and 7 branch.
>
>
> 2018-01-31  Eric Botcazou  <ebotcazou@adacore.com>
>
> 	PR rtl-optimization/84071
> 	* doc/tm.texi.in (WORD_REGISTER_OPERATIONS): Add explicit case.
> 	* doc/tm.texi: Regenerate.
>
> Index: doc/tm.texi.in
> ===================================================================
> --- doc/tm.texi.in	(revision 257227)
> +++ doc/tm.texi.in	(working copy)
> @@ -7376,8 +7376,12 @@ is in effect.
>
>  @defmac WORD_REGISTER_OPERATIONS
>  Define this macro to 1 if operations between registers with integral mode
> -smaller than a word are always performed on the entire register.
> -Most RISC machines have this property and most CISC machines do not.
> +smaller than a word are always performed on the entire register.  To be
> +more explicit, if you start with a pair of @code{word_mode} registers with
> +known values and you do a subword, for example @code{QImode}, addition on
> +the low part of the registers, then the compiler may consider that the
> +result has a known value in @code{word_mode} too if the macro is defined
> +to 1.  Most RISC machines have this property and most CISC machines do not.

By QImode addition, do you mean:

   (set (subreg:QI (reg:SI X1) N)
   	(plus:QI (subreg:QI (reg:SI X2) N)
        	 (subreg:QI (reg:SI X3) N)))

?  I thought the point was instead that the target expected such ops
to be done on word_mode, even if the values involved are naturally QImode:

   (set (subreg:SI (reg:QI Y1) 0)
   	(plus:SI (subreg:SI (reg:QI Y2) 0)
        	 (subreg:SI (reg:QI Y3) 0)))

Most RISC/WORD_REGISTER_OPERATIONS targets wouldn't provide QImode
addition patterns, so the first insn seems unlikely.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fix PR rtl-optimization/84071
  2018-01-31 19:17   ` Richard Sandiford
@ 2018-02-01 10:14     ` Richard Earnshaw (lists)
  2018-02-02 12:21       ` Eric Botcazou
  2018-02-02 12:06     ` Eric Botcazou
  1 sibling, 1 reply; 7+ messages in thread
From: Richard Earnshaw (lists) @ 2018-02-01 10:14 UTC (permalink / raw)
  To: Eric Botcazou, gcc-patches, richard.sandiford

On 31/01/18 19:04, Richard Sandiford wrote:
> Eric Botcazou <ebotcazou@adacore.com> writes:
>>> Tested on SPARC64/Linux and ARM/EABI, applied on mainline and 7 branch.
>>
>> As discussed in the audit trail, this beefs up the internal documentation 
>> about WORD_REGISTER_OPERATIONS.
>>
>> Tested with 'make doc', applied on mainline and 7 branch.
>>
>>
>> 2018-01-31  Eric Botcazou  <ebotcazou@adacore.com>
>>
>> 	PR rtl-optimization/84071
>> 	* doc/tm.texi.in (WORD_REGISTER_OPERATIONS): Add explicit case.
>> 	* doc/tm.texi: Regenerate.
>>
>> Index: doc/tm.texi.in
>> ===================================================================
>> --- doc/tm.texi.in	(revision 257227)
>> +++ doc/tm.texi.in	(working copy)
>> @@ -7376,8 +7376,12 @@ is in effect.
>>
>>  @defmac WORD_REGISTER_OPERATIONS
>>  Define this macro to 1 if operations between registers with integral mode
>> -smaller than a word are always performed on the entire register.
>> -Most RISC machines have this property and most CISC machines do not.
>> +smaller than a word are always performed on the entire register.  To be
>> +more explicit, if you start with a pair of @code{word_mode} registers with
>> +known values and you do a subword, for example @code{QImode}, addition on
>> +the low part of the registers, then the compiler may consider that the
>> +result has a known value in @code{word_mode} too if the macro is defined
>> +to 1.  Most RISC machines have this property and most CISC machines do not.
> 
> By QImode addition, do you mean:
> 
>    (set (subreg:QI (reg:SI X1) N)
>    	(plus:QI (subreg:QI (reg:SI X2) N)
>         	 (subreg:QI (reg:SI X3) N)))
> 
> ?  I thought the point was instead that the target expected such ops
> to be done on word_mode, even if the values involved are naturally QImode:
> 
>    (set (subreg:SI (reg:QI Y1) 0)
>    	(plus:SI (subreg:SI (reg:QI Y2) 0)
>         	 (subreg:SI (reg:QI Y3) 0)))
> 
> Most RISC/WORD_REGISTER_OPERATIONS targets wouldn't provide QImode
> addition patterns, so the first insn seems unlikely.
> 
> Thanks,
> Richard
> 

That's always been my interpretation too.  Seems like we may be changing
the meaning of this macro...

R.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fix PR rtl-optimization/84071
  2018-01-31 19:17   ` Richard Sandiford
  2018-02-01 10:14     ` Richard Earnshaw (lists)
@ 2018-02-02 12:06     ` Eric Botcazou
  1 sibling, 0 replies; 7+ messages in thread
From: Eric Botcazou @ 2018-02-02 12:06 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches

> By QImode addition, do you mean:
> 
>    (set (subreg:QI (reg:SI X1) N)
>    	(plus:QI (subreg:QI (reg:SI X2) N)
>         	 (subreg:QI (reg:SI X3) N)))
> 
> ?

Yes.

> I thought the point was instead that the target expected such ops
> to be done on word_mode, even if the values involved are naturally QImode:
> 
>    (set (subreg:SI (reg:QI Y1) 0)
>    	(plus:SI (subreg:SI (reg:QI Y2) 0)
>         	 (subreg:SI (reg:QI Y3) 0)))

But that's not what the implementation of WORD_REGISTER_OPERATIONS does; IOW 
you'll get the above from expand_binop without the macro if there is no addqi.

> Most RISC/WORD_REGISTER_OPERATIONS targets wouldn't provide QImode
> addition patterns, so the first insn seems unlikely.

Indeed, SImode on 64-bit RISC targets should have been more realistic, but I 
didn't want to give a 64-bit example.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fix PR rtl-optimization/84071
  2018-02-01 10:14     ` Richard Earnshaw (lists)
@ 2018-02-02 12:21       ` Eric Botcazou
  2018-02-02 13:46         ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Botcazou @ 2018-02-02 12:21 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: gcc-patches, richard.sandiford

> That's always been my interpretation too.  Seems like we may be changing
> the meaning of this macro...

The main (and essentially only) effect of WORD_REGISTER_OPERATIONS in the 
compiler happens during combine and is explained by this comment taken from 
eliminate_regs_1 and written by Jim in 1998:

#ifdef WORD_REGISTER_OPERATIONS
	   /* On these machines, combine can create rtl of the form
		      (set (subreg:m1 (reg:m2 R) 0) ...)
	      where m1 < m2, and expects something interesting to 
	      happen to the entire word.  Moreover, it will use the
	      (reg:m2 R) later, expecting all bits to be preserved.
	      So if the number of words is the same, preserve the 
	      subreg so that push_reloads can see it.  */
	   && ! ((x_size-1)/UNITS_PER_WORD == (new_size-1) UNITS_PER_WORD)
#endif

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fix PR rtl-optimization/84071
  2018-02-02 12:21       ` Eric Botcazou
@ 2018-02-02 13:46         ` Richard Earnshaw (lists)
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Earnshaw (lists) @ 2018-02-02 13:46 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches, richard.sandiford

On 02/02/18 12:21, Eric Botcazou wrote:
>> That's always been my interpretation too.  Seems like we may be changing
>> the meaning of this macro...
> 
> The main (and essentially only) effect of WORD_REGISTER_OPERATIONS in the 
> compiler happens during combine and is explained by this comment taken from 
> eliminate_regs_1 and written by Jim in 1998:
> 
> #ifdef WORD_REGISTER_OPERATIONS
> 	   /* On these machines, combine can create rtl of the form
> 		      (set (subreg:m1 (reg:m2 R) 0) ...)
> 	      where m1 < m2, and expects something interesting to 
> 	      happen to the entire word.  Moreover, it will use the
> 	      (reg:m2 R) later, expecting all bits to be preserved.
> 	      So if the number of words is the same, preserve the 
> 	      subreg so that push_reloads can see it.  */
> 	   && ! ((x_size-1)/UNITS_PER_WORD == (new_size-1) UNITS_PER_WORD)
> #endif
> 


interesting.  So I guess it all comes down to what 'something
interesting' means and whether that has to be consistent for all modes.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-02-02 13:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-31 10:30 Fix PR rtl-optimization/84071 Eric Botcazou
2018-01-31 15:34 ` Eric Botcazou
2018-01-31 19:17   ` Richard Sandiford
2018-02-01 10:14     ` Richard Earnshaw (lists)
2018-02-02 12:21       ` Eric Botcazou
2018-02-02 13:46         ` Richard Earnshaw (lists)
2018-02-02 12:06     ` Eric Botcazou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).