public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
@ 2016-07-26 17:32 Bernd Edlinger
  2016-07-27 21:31 ` Jeff Law
  0 siblings, 1 reply; 25+ messages in thread
From: Bernd Edlinger @ 2016-07-26 17:32 UTC (permalink / raw)
  To: GCC Patches
  Cc: Eric Botcazou, Bernd Schmidt, Segher Boessenkool, Andreas Schwab,
	Tamar Christina

[-- Attachment #1: Type: text/plain, Size: 1282 bytes --]

Hi!

As described in PR 71779 and PR 70903 we have a wrong code issue on aarch64
which is triggered by a paradoxical subreg that can be created in
store_bit_field_using_insv when the target has an EP_insv bitfield instruction.

The attached patch changes this insn...

(insn 1047 1046 1048 (set (reg:DI 481)
        (subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1
     (nil))

... into a more simple insn:

(insn 1047 1046 1048 (set (reg:DI 479)
        (zero_extend:DI (reg:SI 480))) isl_input.c:2496 -1
     (nil))

which manages to fix two bugs at the same time.

The patch does not include a test case, as I was looking at the PR 71779
just by curiosity, and I have only a theoretical interest in aarch64. However,
it should be easy to extract one from PR 70903 at least, but I can't do that by
myself.

Therefore I would like to leave the test case(s) to Cristina Tamar from ARM,
and/or Andreas Schwab, who have also helped out with bootstrapping and
reg-testing on aarch64 and aarch64-ilp32.

Bootstrap and reg-test on x86_64-linux-gnu with no regressions.
Successfully bootstrapped on aarch64_ilp32 and the ISL testsuite passes.
Also successfully bootstrapped on an aarch64 juno board and no regressions.


Is it OK for trunk?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-pr71779.diff --]
[-- Type: text/x-patch; name="patch-pr71779.diff", Size: 917 bytes --]

2016-07-25  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR rtl-optimization/71779
	PR rtl-optimization/70903
	* expmed.c (store_bit_field_using_insv): Don't generate a paradoxical
	subreg.

Index: gcc/calls.c
Index: expmed.c
===================================================================
--- expmed.c	(revision 238694)
+++ expmed.c	(working copy)
@@ -664,14 +664,7 @@ store_bit_field_using_insv (const extraction_insn
 	     if we must narrow it, be sure we do it correctly.  */
 
 	  if (GET_MODE_SIZE (GET_MODE (value)) < GET_MODE_SIZE (op_mode))
-	    {
-	      tmp = simplify_subreg (op_mode, value1, GET_MODE (value), 0);
-	      if (! tmp)
-		tmp = simplify_gen_subreg (op_mode,
-					   force_reg (GET_MODE (value),
-						      value1),
-					   GET_MODE (value), 0);
-	    }
+	    tmp = convert_to_mode (op_mode, value1, 1);
 	  else
 	    {
 	      tmp = gen_lowpart_if_possible (op_mode, value1);



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-07-26 17:32 [PATCH] Fix wrong code on aarch64 due to paradoxical subreg Bernd Edlinger
@ 2016-07-27 21:31 ` Jeff Law
  2016-07-28  3:50   ` Bernd Edlinger
  2016-07-28 20:59   ` Bernd Edlinger
  0 siblings, 2 replies; 25+ messages in thread
From: Jeff Law @ 2016-07-27 21:31 UTC (permalink / raw)
  To: Bernd Edlinger, GCC Patches
  Cc: Eric Botcazou, Bernd Schmidt, Segher Boessenkool, Andreas Schwab,
	Tamar Christina

On 07/26/2016 11:32 AM, Bernd Edlinger wrote:
> Hi!
>
> As described in PR 71779 and PR 70903 we have a wrong code issue on aarch64
> which is triggered by a paradoxical subreg that can be created in
> store_bit_field_using_insv when the target has an EP_insv bitfield instruction.
>
> The attached patch changes this insn...
>
> (insn 1047 1046 1048 (set (reg:DI 481)
>         (subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1
>      (nil))
>
> ... into a more simple insn:
>
> (insn 1047 1046 1048 (set (reg:DI 479)
>         (zero_extend:DI (reg:SI 480))) isl_input.c:2496 -1
>      (nil))
>
> which manages to fix two bugs at the same time.
>
> The patch does not include a test case, as I was looking at the PR 71779
> just by curiosity, and I have only a theoretical interest in aarch64. However,
> it should be easy to extract one from PR 70903 at least, but I can't do that by
> myself.
>
> Therefore I would like to leave the test case(s) to Cristina Tamar from ARM,
> and/or Andreas Schwab, who have also helped out with bootstrapping and
> reg-testing on aarch64 and aarch64-ilp32.
>
> Bootstrap and reg-test on x86_64-linux-gnu with no regressions.
> Successfully bootstrapped on aarch64_ilp32 and the ISL testsuite passes.
> Also successfully bootstrapped on an aarch64 juno board and no regressions.
>
>
> Is it OK for trunk?
>
>
> Thanks
> Bernd.
>
>
> patch-pr71779.diff
>
>
> 2016-07-25  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>
> 	PR rtl-optimization/71779
> 	PR rtl-optimization/70903
> 	* expmed.c (store_bit_field_using_insv): Don't generate a paradoxical
> 	subreg.
So you're stumbling into another really interesting area.

I can hazard a guess that the reason we create the paradoxical SUBREG 
rather than a zero or sign extension is because various optimizers know 
that the upper bits in a paradoxical are don't care bits and may make 
transformations with that knowledge.

Once you turn that into a zero/sign extend, then the rest of the 
optimizers must preserve the zero/sign extension property -- they have 
no way of knowing the bits were don't cares.

So it's not necessarily clear that your change results in generally 
better code or not.

More importantly, it really feels like you're papering over a real bug 
elsewhere.  AFAICT the use of a paradoxical subreg here is safe.  So the 
real bug ought to be downstream of this point.  Several folks have 
pointed at reload, which would certainly be a point ripe for a 
paradoxical subreg problem.

Sooo. I don't think we can go with this patch as-is today.  We need to 
find the root problem which I believe is later in the pipeline.

This patch might make sense from an optimization standpoint -- I'm 
generally of the opinion that while paradoxical subregs give the 
optimziers more freedom, the optimizers rarely are able to use it and 
they generally know much more about the semantics of and how to optimize 
around zero/sign extensions.  But we really should fix the bug first, 
then come back to improving the code generation.

Now if someone wanted to look for an interesting optimization project -- 
ree.c knows how to do redundant extension eliminations.  A paradoxical 
is a form of extension that isn't currently understood by REE. 
Similarly REE doesn't know how to handle zero-extension as an "and" or 
sign extension as a pair of shifts.  Both of which can occur in practice 
due to costs or ISA limitations.  I wouldn't be surprise if adding those 
capabilities to REE would significantly improve its ability to eliminate 
unnecessary extensions.

Jeff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-07-27 21:31 ` Jeff Law
@ 2016-07-28  3:50   ` Bernd Edlinger
  2016-07-28 20:59   ` Bernd Edlinger
  1 sibling, 0 replies; 25+ messages in thread
From: Bernd Edlinger @ 2016-07-28  3:50 UTC (permalink / raw)
  To: Jeff Law, GCC Patches
  Cc: Eric Botcazou, Bernd Schmidt, Segher Boessenkool, Andreas Schwab,
	Tamar Christina

Am 27.07.2016 um 23:31 schrieb Jeff Law:
> On 07/26/2016 11:32 AM, Bernd Edlinger wrote:
>> Hi!
>>
>> As described in PR 71779 and PR 70903 we have a wrong code issue on
>> aarch64
>> which is triggered by a paradoxical subreg that can be created in
>> store_bit_field_using_insv when the target has an EP_insv bitfield
>> instruction.
>>
>> The attached patch changes this insn...
>>
>> (insn 1047 1046 1048 (set (reg:DI 481)
>>         (subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1
>>      (nil))
>>
>> ... into a more simple insn:
>>
>> (insn 1047 1046 1048 (set (reg:DI 479)
>>         (zero_extend:DI (reg:SI 480))) isl_input.c:2496 -1
>>      (nil))
>>
>> which manages to fix two bugs at the same time.
>>
>> The patch does not include a test case, as I was looking at the PR 71779
>> just by curiosity, and I have only a theoretical interest in aarch64.
>> However,
>> it should be easy to extract one from PR 70903 at least, but I can't
>> do that by
>> myself.
>>
>> Therefore I would like to leave the test case(s) to Cristina Tamar
>> from ARM,
>> and/or Andreas Schwab, who have also helped out with bootstrapping and
>> reg-testing on aarch64 and aarch64-ilp32.
>>
>> Bootstrap and reg-test on x86_64-linux-gnu with no regressions.
>> Successfully bootstrapped on aarch64_ilp32 and the ISL testsuite passes.
>> Also successfully bootstrapped on an aarch64 juno board and no
>> regressions.
>>
>>
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
>>
>>
>> patch-pr71779.diff
>>
>>
>> 2016-07-25  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>
>>     PR rtl-optimization/71779
>>     PR rtl-optimization/70903
>>     * expmed.c (store_bit_field_using_insv): Don't generate a paradoxical
>>     subreg.
> So you're stumbling into another really interesting area.
>
> I can hazard a guess that the reason we create the paradoxical SUBREG
> rather than a zero or sign extension is because various optimizers know
> that the upper bits in a paradoxical are don't care bits and may make
> transformations with that knowledge.
>
> Once you turn that into a zero/sign extend, then the rest of the
> optimizers must preserve the zero/sign extension property -- they have
> no way of knowing the bits were don't cares.
>
> So it's not necessarily clear that your change results in generally
> better code or not.
>

Yes, I understand, on x86 the stage 2/3 did not change but x86 has
probably not a very sophisticated insv code pattern.

> More importantly, it really feels like you're papering over a real bug
> elsewhere.  AFAICT the use of a paradoxical subreg here is safe.  So the
> real bug ought to be downstream of this point.  Several folks have
> pointed at reload, which would certainly be a point ripe for a
> paradoxical subreg problem.
>

I debugged lra already, because I was initially sure it did something
wrong with the subreg, but it turned out that it follows exactly what
you say here, it spills the subreg twice, once as a 32 bit value,
into a 64 bit slot with stack image in the upper word and then again
as a zero extended 64 bit value.  Later the garbage extended 64 bit
value is used when it should not.

So lra seems not to be the reason, but it is apparently spilling the
paradoxical subreg twice.

So at least lra does not generate better code from paradoxical subreg.

> Sooo. I don't think we can go with this patch as-is today.  We need to
> find the root problem which I believe is later in the pipeline.
>
> This patch might make sense from an optimization standpoint -- I'm
> generally of the opinion that while paradoxical subregs give the
> optimziers more freedom, the optimizers rarely are able to use it and
> they generally know much more about the semantics of and how to optimize
> around zero/sign extensions.  But we really should fix the bug first,
> then come back to improving the code generation.
>

Yes.  And paradoxical subreg have non-intuitive semantics.


Thanks
Bernd.

> Now if someone wanted to look for an interesting optimization project --
> ree.c knows how to do redundant extension eliminations.  A paradoxical
> is a form of extension that isn't currently understood by REE. Similarly
> REE doesn't know how to handle zero-extension as an "and" or sign
> extension as a pair of shifts.  Both of which can occur in practice due
> to costs or ISA limitations.  I wouldn't be surprise if adding those
> capabilities to REE would significantly improve its ability to eliminate
> unnecessary extensions.
>
> Jeff
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-07-27 21:31 ` Jeff Law
  2016-07-28  3:50   ` Bernd Edlinger
@ 2016-07-28 20:59   ` Bernd Edlinger
  2016-07-29  7:02     ` Segher Boessenkool
  1 sibling, 1 reply; 25+ messages in thread
From: Bernd Edlinger @ 2016-07-28 20:59 UTC (permalink / raw)
  To: Jeff Law, GCC Patches
  Cc: Eric Botcazou, Bernd Schmidt, Segher Boessenkool, Andreas Schwab,
	Tamar Christina

Hi,

On 07/27/16 23:31, Jeff Law wrote:
> So you're stumbling into another really interesting area.
>

Absolutely, I am just too curious what's going on here ;-)


> I can hazard a guess that the reason we create the paradoxical SUBREG
> rather than a zero or sign extension is because various optimizers know
> that the upper bits in a paradoxical are don't care bits and may make
> transformations with that knowledge.
>
> Once you turn that into a zero/sign extend, then the rest of the
> optimizers must preserve the zero/sign extension property -- they have
> no way of knowing the bits were don't cares.
>
> So it's not necessarily clear that your change results in generally
> better code or not.
>
> More importantly, it really feels like you're papering over a real bug
> elsewhere.  AFAICT the use of a paradoxical subreg here is safe.  So the
> real bug ought to be downstream of this point.  Several folks have
> pointed at reload, which would certainly be a point ripe for a
> paradoxical subreg problem.


I have looked again at the test case of PR 71779 and discovered
something that I wanted to discuss here.

I have tried to figure out what made combine change this:


(insn 1048 1047 1049 101 (set (zero_extract:DI (reg/v:DI 191 [ 
obj1D.17943 ])
             (const_int 32 [0x20])
             (const_int 0 [0]))
         (reg/f:DI 481)) isl_input.c:2496 691 {*insv_regdi}
      (nil))
(insn 1049 1048 1050 101 (set (zero_extract:DI (reg/v:DI 191 [ 
obj1D.17943 ])
             (const_int 32 [0x20])
             (const_int 32 [0x20]))
         (reg/v/f:DI 145 [ SR.327D.17957 ])) isl_input.c:2496 691 
{*insv_regdi}
      (expr_list:REG_DEAD (reg/v/f:DI 145 [ SR.327D.17957 ])
         (nil)))
(insn 1050 1049 1051 101 (set (reg:DI 1 x1)
         (reg/v:DI 191 [ obj1D.17943 ])) isl_input.c:2496 50 
{*movdi_aarch64}
      (expr_list:REG_DEAD (reg/v:DI 191 [ obj1D.17943 ])
         (nil)))


into that (which is wrong because reg 481 has undefined bits
in insn 1050):


(insn 1048 1047 1049 101 (set (zero_extract:DI (reg/v:DI 191 [ 
obj1D.17943 ])
             (const_int 32 [0x20])
             (const_int 0 [0]))
         (reg/f:DI 481)) isl_input.c:2496 691 {*insv_regdi}
      (nil))
(insn 1049 1048 1050 101 (set (zero_extract:DI (reg/v:DI 191 [ 
obj1D.17943 ])
             (const_int 32 [0x20])
             (const_int 32 [0x20]))
         (reg/v/f:DI 145 [ SR.327D.17957 ])) isl_input.c:2496 691 
{*insv_regdi}
      (expr_list:REG_DEAD (reg/v/f:DI 145 [ SR.327D.17957 ])
         (nil)))
(insn 1050 1049 1051 101 (set (reg:DI 1 x1)
         (ior:DI (ashift:DI (reg/v/f:DI 145 [ SR.327D.17957 ])
                 (const_int 32 [0x20]))
             (reg/f:DI 481))) isl_input.c:2496 50 {*movdi_aarch64}
      (nil))


The interesting fact is that combine did that while it
was only considering 1048, 1049 -> 1050,
and not the instruction with the paradoxical subreg:

(insn 1047 1044 1048 101 (set (reg/f:DI 481)
         (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50 {*movdi_aarch64}
      (nil))

I found by debugging that expand_field_assignment
was trying to build:

(and:DI (reg/f:DI 481)
         (const_int -4294967296 [0xffffffff00000000]))

with this statement:

       masked = simplify_gen_binary (ASHIFT, compute_mode,
                                     simplify_gen_binary (
                                       AND, compute_mode,
                                       gen_lowpart (compute_mode, 
SET_SRC (x)),
                                       mask),
                                     pos);

but the result is just (reg/f:DI 481) !

I debugged and found the first wrong result in
rtlanal.c (nonzero_bits1) where the following if-statement
tells us that the upper 32 bits of reg 481 must be zero:

   switch (code)
     {
     case REG:
#if defined(POINTERS_EXTEND_UNSIGNED)
       /* If pointers extend unsigned and this is a pointer in Pmode, 
say that
          all the bits above ptr_mode are known to be zero.  */
       /* As we do not know which address space the pointer is referring to,
          we can do this only if the target does not support different 
pointer
          or address modes depending on the address space.  */
       if (target_default_pointer_address_modes_p ()
           && POINTERS_EXTEND_UNSIGNED && GET_MODE (x) == Pmode
           && REG_POINTER (x)
           && !targetm.have_ptr_extend ())
         nonzero &= GET_MODE_MASK (ptr_mode);
#endif


Pmode = DImode, ptr_mode=SImode, and REG_POINTER(x) is true.

Yes, the value is actually a pointer!

Which means, that it is not safe to extend a pointer from SI to DI
with anything but a zero-extend.

And if I remove this statement the wrong code is still not fixed.
So there must be more places where the similar assumptions are
made.

However, PR 70903 is not about pointers, and uses a mode where
Pmode=ptr_mode=DImode, so there must be a different as hard to track
down reason why this did not work out right.


So the question is, if the paradoxical subreg is really such a good
idea, when it triggers so many different bugs all over the place?


Bernd.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-07-28 20:59   ` Bernd Edlinger
@ 2016-07-29  7:02     ` Segher Boessenkool
  2016-07-29 13:04       ` Bernd Edlinger
  0 siblings, 1 reply; 25+ messages in thread
From: Segher Boessenkool @ 2016-07-29  7:02 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: Jeff Law, GCC Patches, Eric Botcazou, Bernd Schmidt,
	Andreas Schwab, Tamar Christina

On Thu, Jul 28, 2016 at 08:59:44PM +0000, Bernd Edlinger wrote:
> (insn 1047 1044 1048 101 (set (reg/f:DI 481)
>          (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50 {*movdi_aarch64}
>       (nil))

In your first mail you showed reg 481 as _not_ being REG_POINTER:

(insn 1047 1046 1048 (set (reg:DI 481)
        (subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1
     (nil))

(note the lack of /f).  So which is it?  REG_POINTER here is not correct
as far as I can see.


Segher

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-07-29  7:02     ` Segher Boessenkool
@ 2016-07-29 13:04       ` Bernd Edlinger
  2016-07-30  8:18         ` Bernd Edlinger
  0 siblings, 1 reply; 25+ messages in thread
From: Bernd Edlinger @ 2016-07-29 13:04 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Jeff Law, GCC Patches, Eric Botcazou, Bernd Schmidt,
	Andreas Schwab, Tamar Christina

On 07/29/16 09:02, Segher Boessenkool wrote:
> On Thu, Jul 28, 2016 at 08:59:44PM +0000, Bernd Edlinger wrote:
>> (insn 1047 1044 1048 101 (set (reg/f:DI 481)
>>           (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50 {*movdi_aarch64}
>>        (nil))
>
> In your first mail you showed reg 481 as _not_ being REG_POINTER:
>
> (insn 1047 1046 1048 (set (reg:DI 481)
>          (subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1
>       (nil))
>
> (note the lack of /f).  So which is it?  REG_POINTER here is not correct
> as far as I can see.
>

Oh yes, that's an interesting point, in expand I still see this:


(insn 1047 1046 1048 (set (reg:DI 481)
         (subreg:DI (reg/f:SI 479) 0)) isl_input.c:2496 -1
      (nil))

But in the last dump before combine I have this:

(insn 1047 1044 1048 101 (set (reg/f:DI 481)
         (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50 {*movdi_aarch64}
      (nil))


However I was not really surpised by that, because the reg 545 does
in deed hold a pointer value: &isl_obj_map_vtable

(insn 22 17 23 51 (set (reg/f:SI 544)
         (high:SI (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0] 
<var_decl 0x7f83d3273f30 isl_obj_map_vtable>))) isl_input.c:2415 49 
{*movsi_aarch64}
      (nil))
(insn 23 22 24 51 (set (reg/f:SI 545)
         (lo_sum:SI (reg/f:SI 544)
             (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0] 
<var_decl 0x7f83d3273f30 isl_obj_map_vtable>))) isl_input.c:2415 917 
{add_losym_si}
      (expr_list:REG_DEAD (reg/f:SI 544)
         (expr_list:REG_EQUAL (symbol_ref:SI ("isl_obj_map_vtable") 
[flags 0xc0] <var_decl 0x7f83d3273f30 isl_obj_map_vtable>)
             (nil))))

The "reg/f:DI 481" first appeared in cse1.


I'll try to see what's happening there next....


Thanks
Bernd.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-07-29 13:04       ` Bernd Edlinger
@ 2016-07-30  8:18         ` Bernd Edlinger
  2016-07-30 11:39           ` Segher Boessenkool
  2016-08-01 17:16           ` Jeff Law
  0 siblings, 2 replies; 25+ messages in thread
From: Bernd Edlinger @ 2016-07-30  8:18 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Jeff Law, GCC Patches, Eric Botcazou, Bernd Schmidt,
	Andreas Schwab, Tamar Christina

On 07/29/16 15:03, Bernd Edlinger wrote:
> On 07/29/16 09:02, Segher Boessenkool wrote:
>> On Thu, Jul 28, 2016 at 08:59:44PM +0000, Bernd Edlinger wrote:
>>> (insn 1047 1044 1048 101 (set (reg/f:DI 481)
>>>           (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50
>>> {*movdi_aarch64}
>>>        (nil))
>>
>> In your first mail you showed reg 481 as _not_ being REG_POINTER:
>>
>> (insn 1047 1046 1048 (set (reg:DI 481)
>>          (subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1
>>       (nil))
>>
>> (note the lack of /f).  So which is it?  REG_POINTER here is not correct
>> as far as I can see.
>>
>
> Oh yes, that's an interesting point, in expand I still see this:
>
>
> (insn 1047 1046 1048 (set (reg:DI 481)
>          (subreg:DI (reg/f:SI 479) 0)) isl_input.c:2496 -1
>       (nil))
>
> But in the last dump before combine I have this:
>
> (insn 1047 1044 1048 101 (set (reg/f:DI 481)
>          (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50 {*movdi_aarch64}
>       (nil))
>
>
> However I was not really surpised by that, because the reg 545 does
> in deed hold a pointer value: &isl_obj_map_vtable
>
> (insn 22 17 23 51 (set (reg/f:SI 544)
>          (high:SI (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0]
> <var_decl 0x7f83d3273f30 isl_obj_map_vtable>))) isl_input.c:2415 49
> {*movsi_aarch64}
>       (nil))
> (insn 23 22 24 51 (set (reg/f:SI 545)
>          (lo_sum:SI (reg/f:SI 544)
>              (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0]
> <var_decl 0x7f83d3273f30 isl_obj_map_vtable>))) isl_input.c:2415 917
> {add_losym_si}
>       (expr_list:REG_DEAD (reg/f:SI 544)
>          (expr_list:REG_EQUAL (symbol_ref:SI ("isl_obj_map_vtable")
> [flags 0xc0] <var_decl 0x7f83d3273f30 isl_obj_map_vtable>)
>              (nil))))
>
> The "reg/f:DI 481" first appeared in cse1.
>
>
> I'll try to see what's happening there next....


Ok, I the incorrect REG_POINTER is done here:

cse_main -> reg_scan -> reg_scan_mark_refs -> set_reg_attrs_from_value

and here I see a bug, because if POINTERS_EXTEND_UNSIGNED
can_be_reg_pointer is only set to false if x is SIGN_EXTEND but not
if x is a SUBREG as in this case.

So I think that should be fixed this way:

Index: emit-rtl.c
===================================================================
--- emit-rtl.c	(revision 238891)
+++ emit-rtl.c	(working copy)
@@ -1155,7 +1155,7 @@ set_reg_attrs_from_value (rtx reg, rtx x)
  	 || (GET_CODE (x) == SUBREG && subreg_lowpart_p (x)))
      {
  #if defined(POINTERS_EXTEND_UNSIGNED)
-      if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
+      if (((GET_CODE (x) != ZERO_EXTEND && POINTERS_EXTEND_UNSIGNED)
  	   || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
  	  && !targetm.have_ptr_extend ())
  	can_be_reg_pointer = false;


What do you think does this look like the right fix?

With this patch the code the reg/f:DI 481 does no longer appear,
and also the invalid combine does no longer happen.

However the test case from pr70903 does not get fixed by this.

But when I look at the dumps, I see again the first incorrect
transformation in cse2 (again cse!):

(insn 20 19 21 2 (set (reg:DI 94)
         (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64}
      (expr_list:REG_EQUAL (const_int 255 [0xff])
         (expr_list:REG_DEAD (reg:QI 93)
             (nil))))

but that is simply wrong, because later optimization passes
expect reg 94 to be 0x000000ff but the upper bits are unspecified,
so that REG_EQUAL should better not exist.

When I looked at cse.c I saw a comment in #if 0, which exactly
describes the problem that we have with paradoxical subreg here:

Index: cse.c
===================================================================
--- cse.c	(revision 238891)
+++ cse.c	(working copy)
@@ -4716,10 +4716,6 @@ cse_insn (rtx_insn *insn)
  	    }
  	}

-#if 0
-      /* It is no longer clear why we used to do this, but it doesn't
-	 appear to still be needed.  So let's try without it since this
-	 code hurts cse'ing widened ops.  */
        /* If source is a paradoxical subreg (such as QI treated as an SI),
  	 treat it as volatile.  It may do the work of an SI in one context
  	 where the extra bits are not being used, but cannot replace an SI
@@ -4726,7 +4722,6 @@ cse_insn (rtx_insn *insn)
  	 in general.  */
        if (paradoxical_subreg_p (src))
  	sets[i].src_volatile = 1;
-#endif

        /* Locate all possible equivalent forms for SRC.  Try to replace
           SRC in the insn with each cheaper equivalent.


removing the #if 0 seems to fix the sample from pr70903, but generates
3 more instructions than with my initial patch:

@@ -6,12 +6,15 @@
  foo:
  	adrp	x0, .LC0
  	add	x0, x0, :lo12:.LC0
+	mov	x5, -1
  	mov	x1, 0
  	ldp	x4, x3, [x0, 8]
  	ldr	x2, [x0, 24]
-	mov	x0, 255
-	bfi	x1, x0, 0, 32
+	mov	x0, 0
+	bfi	x0, x5, 0, 8
  	stp	x3, x2, [x8, 16]
+	bfi	x1, x0, 0, 32
+	mov	x0, 255
  	bfi	x1, x0, 32, 32
  	stp	x1, x4, [x8]
  	ret

- = code without paradoxical subreg
+ = code with paradoxical subreg and above patch.


The generated code looks correct, but I can not run the test cases,
so I am still waiting for confirmation that this actually works io
the target.

When I looked at svn blame cse.c, I saw that this #if 0 was introduced
here:

svn log -r1614 -v
------------------------------------------------------------------------
r1614 | kenner | 1992-07-17 11:57:24 +0200 (Fri, 17 Jul 1992) | 2 lines
Changed paths:
    M /trunk/gcc/cse.c
    M /trunk/gcc/libgcc2.c
    M /trunk/gcc/reload.c
    M /trunk/gcc/reload1.c

*** empty log message ***

------------------------------------------------------------------------


Again, a check-in without comment, that fixes at least 3 different
bugs, without test case, and of course the gcc-patches archives do only
go back to 1998...

I am pretty sure, that this has to be reverted, and that it can
generate less efficient code.

What do you think?


Bernd.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-07-30  8:18         ` Bernd Edlinger
@ 2016-07-30 11:39           ` Segher Boessenkool
  2016-07-31 10:44             ` Bernd Edlinger
  2016-08-01 17:16           ` Jeff Law
  1 sibling, 1 reply; 25+ messages in thread
From: Segher Boessenkool @ 2016-07-30 11:39 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: Jeff Law, GCC Patches, Eric Botcazou, Bernd Schmidt,
	Andreas Schwab, Tamar Christina

On Sat, Jul 30, 2016 at 08:17:59AM +0000, Bernd Edlinger wrote:
> Ok, I the incorrect REG_POINTER is done here:
> 
> cse_main -> reg_scan -> reg_scan_mark_refs -> set_reg_attrs_from_value
> 
> and here I see a bug, because if POINTERS_EXTEND_UNSIGNED
> can_be_reg_pointer is only set to false if x is SIGN_EXTEND but not
> if x is a SUBREG as in this case.
> 
> So I think that should be fixed this way:
> 
> Index: emit-rtl.c
> ===================================================================
> --- emit-rtl.c	(revision 238891)
> +++ emit-rtl.c	(working copy)
> @@ -1155,7 +1155,7 @@ set_reg_attrs_from_value (rtx reg, rtx x)
>   	 || (GET_CODE (x) == SUBREG && subreg_lowpart_p (x)))
>       {
>   #if defined(POINTERS_EXTEND_UNSIGNED)
> -      if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
> +      if (((GET_CODE (x) != ZERO_EXTEND && POINTERS_EXTEND_UNSIGNED)
>   	   || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
>   	  && !targetm.have_ptr_extend ())
>   	can_be_reg_pointer = false;
> 
> 
> What do you think does this look like the right fix?

There also is (from rtl.texi), for paradoxical subregs:

"""
@item @code{subreg} of @code{reg}s
The upper bits are defined when @code{SUBREG_PROMOTED_VAR_P} is true.
@code{SUBREG_PROMOTED_UNSIGNED_P} describes what the upper bits hold.
Such subregs usually represent local variables, register variables
and parameter pseudo variables that have been promoted to a wider mode.
"""

so you might want to test for these as well.

> With this patch the code the reg/f:DI 481 does no longer appear,
> and also the invalid combine does no longer happen.

Good :-)

> However the test case from pr70903 does not get fixed by this.
> 
> But when I look at the dumps, I see again the first incorrect
> transformation in cse2 (again cse!):
> 
> (insn 20 19 21 2 (set (reg:DI 94)
>          (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64}
>       (expr_list:REG_EQUAL (const_int 255 [0xff])
>          (expr_list:REG_DEAD (reg:QI 93)
>              (nil))))
> 
> but that is simply wrong, because later optimization passes
> expect reg 94 to be 0x000000ff but the upper bits are unspecified,
> so that REG_EQUAL should better not exist.

Agreed.  So where does that come from?

> When I looked at cse.c I saw a comment in #if 0, which exactly
> describes the problem that we have with paradoxical subreg here:

<snip>

> I am pretty sure, that this has to be reverted, and that it can
> generate less efficient code.
> 
> What do you think?

I think this pessimises the generated code too much; there must be a
better solution.


Segher

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-07-30 11:39           ` Segher Boessenkool
@ 2016-07-31 10:44             ` Bernd Edlinger
  2016-07-31 10:56               ` Bernd Edlinger
  2016-08-01 17:54               ` Jeff Law
  0 siblings, 2 replies; 25+ messages in thread
From: Bernd Edlinger @ 2016-07-31 10:44 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Jeff Law, GCC Patches, Eric Botcazou, Bernd Schmidt,
	Andreas Schwab, Tamar Christina

On 07/30/16 13:39, Segher Boessenkool wrote:
> On Sat, Jul 30, 2016 at 08:17:59AM +0000, Bernd Edlinger wrote:
>> Ok, I the incorrect REG_POINTER is done here:
>>
>> cse_main -> reg_scan -> reg_scan_mark_refs -> set_reg_attrs_from_value
>>
>> and here I see a bug, because if POINTERS_EXTEND_UNSIGNED
>> can_be_reg_pointer is only set to false if x is SIGN_EXTEND but not
>> if x is a SUBREG as in this case.
>>
>> So I think that should be fixed this way:
>>
>> Index: emit-rtl.c
>> ===================================================================
>> --- emit-rtl.c	(revision 238891)
>> +++ emit-rtl.c	(working copy)
>> @@ -1155,7 +1155,7 @@ set_reg_attrs_from_value (rtx reg, rtx x)
>>    	 || (GET_CODE (x) == SUBREG && subreg_lowpart_p (x)))
>>        {
>>    #if defined(POINTERS_EXTEND_UNSIGNED)
>> -      if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
>> +      if (((GET_CODE (x) != ZERO_EXTEND && POINTERS_EXTEND_UNSIGNED)
>>    	   || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
>>    	  && !targetm.have_ptr_extend ())
>>    	can_be_reg_pointer = false;
>>
>>
>> What do you think does this look like the right fix?
>
> There also is (from rtl.texi), for paradoxical subregs:
>
> """
> @item @code{subreg} of @code{reg}s
> The upper bits are defined when @code{SUBREG_PROMOTED_VAR_P} is true.
> @code{SUBREG_PROMOTED_UNSIGNED_P} describes what the upper bits hold.
> Such subregs usually represent local variables, register variables
> and parameter pseudo variables that have been promoted to a wider mode.
> """
>
> so you might want to test for these as well.
>

like this?

Index: emit-rtl.c
===================================================================
--- emit-rtl.c	(revision 238891)
+++ emit-rtl.c	(working copy)
@@ -1156,7 +1156,11 @@
      {
  #if defined(POINTERS_EXTEND_UNSIGNED)
        if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
-	   || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
+	   || (GET_CODE (x) == ZERO_EXTEND && ! POINTERS_EXTEND_UNSIGNED)
+	   || (paradoxical_subreg_p (x)
+	       && ! (SUBREG_PROMOTED_VAR_P (x)
+		     && SUBREG_CHECK_PROMOTED_SIGN (x,
+						    POINTERS_EXTEND_UNSIGNED))))
  	  && !targetm.have_ptr_extend ())
  	can_be_reg_pointer = false;
  #endif

In the test case of pr71779 the subreg is no promoted var, so this
has no influence at this time.  Also I have not POINTERS_EXTEND_SIGNED
target, but for the symmetry it ought to check explicitly for
ZERO_EXTEND as well, and allow the pointer value to pass thru a
TRUNCATE.

>> With this patch the code the reg/f:DI 481 does no longer appear,
>> and also the invalid combine does no longer happen.
>
> Good :-)
>
>> However the test case from pr70903 does not get fixed by this.
>>
>> But when I look at the dumps, I see again the first incorrect
>> transformation in cse2 (again cse!):
>>
>> (insn 20 19 21 2 (set (reg:DI 94)
>>           (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64}
>>        (expr_list:REG_EQUAL (const_int 255 [0xff])
>>           (expr_list:REG_DEAD (reg:QI 93)
>>               (nil))))
>>
>> but that is simply wrong, because later optimization passes
>> expect reg 94 to be 0x000000ff but the upper bits are unspecified,
>> so that REG_EQUAL should better not exist.
>
> Agreed.  So where does that come from?
>
>> When I looked at cse.c I saw a comment in #if 0, which exactly
>> describes the problem that we have with paradoxical subreg here:
>
> <snip>
>
>> I am pretty sure, that this has to be reverted, and that it can
>> generate less efficient code.
>>
>> What do you think?
>
> I think this pessimises the generated code too much; there must be a
> better solution.
>

I debugged the cse again, to see how it works and why it mis-compiles
this example.

I found out that the trouble starts one instruction earlier:

(insn 19 40 20 2 (set (subreg:DI (reg:QI 93) 0)
         ) pr.c:8 50 {*movdi_aarch64}
      (expr_list:REG_DEAD (reg:DI 110 [ D.3037 ])
         (nil)))

cse_main sees the constant value and maps:
(reg:QI 93)  =>  (const_int 255 [0xff])

plus (I mean that is wrong):
(subreg:DI (reg:QI 93) 0)  =>  (const_int 255 [0xff])

When the next insn is scanned

(insn 20 19 21 2 (set (reg:DI 94)
         (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64}
      (expr_list:REG_DEAD (reg:QI 93)
         (nil)))

I see fold_rtx (subreg:DI (reg:QI 93) 0))
return (const_int 255 [0xff]) which is wrong.

now cse maps:
(reg:DI 94)  =>  (const_int 255 [0xff])

which is also not guaranteed to be correct, but the REG_EQUAL at
insn 20 is probably only a symptom, suppressing only that does not
fix the test case, because later we have:

(insn 25 24 26 2 (set (reg:DI 97)
         (const_int 255 [0xff])) pr.c:9 50 {*movdi_aarch64}
      (nil))
(insn 26 25 48 2 (set (zero_extract:DI (reg:DI 102 [ D.3038 ])
             (const_int 32 [0x20])
             (const_int 32 [0x20]))
         (reg:DI 97)) pr.c:9 691 {*insv_regdi}
      (expr_list:REG_DEAD (reg:DI 97)
         (nil)))

get replaced to

(insn 26 24 48 2 (set (zero_extract:DI (reg:DI 102 [ D.3038 ])
             (const_int 32 [0x20])
             (const_int 32 [0x20]))
         (reg:DI 94)) pr.c:9 691 {*insv_regdi}
      (expr_list:REG_DEAD (reg:DI 97)
         (nil)))

because when insn 25 is scanned it the constant 255 is already
mapped to reg 94 at insn 20.

now cse maps:
(reg:DI 97)  =>  (reg:DI 94)

therefore canon_reg (reg:DI 97) returns (reg:DI 94)
which makes canonicalize_insn do the wrong transformation at insn 26.


Now I think I found a better place for a patch, where the first bogus
mapping is recorded:

Index: cse.c
===================================================================
--- cse.c	(revision 238891)
+++ cse.c	(working copy)
@@ -5898,15 +5898,7 @@ cse_insn (rtx_insn *insn)
  	    || GET_MODE (dest) == BLKmode
  	    /* If we didn't put a REG_EQUAL value or a source into the hash
  	       table, there is no point is recording DEST.  */
-	    || sets[i].src_elt == 0
-	    /* If DEST is a paradoxical SUBREG and SRC is a ZERO_EXTEND
-	       or SIGN_EXTEND, don't record DEST since it can cause
-	       some tracking to be wrong.
-
-	       ??? Think about this more later.  */
-	    || (paradoxical_subreg_p (dest)
-		&& (GET_CODE (sets[i].src) == SIGN_EXTEND
-		    || GET_CODE (sets[i].src) == ZERO_EXTEND)))
+	    || sets[i].src_elt == 0)
  	  continue;

  	/* STRICT_LOW_PART isn't part of the value BEING set,
@@ -5925,6 +5917,11 @@ cse_insn (rtx_insn *insn)
  	      sets[i].dest_hash = HASH (dest, GET_MODE (dest));
  	    }

+	/* If DEST is a paradoxical SUBREG, don't record DEST since it can
+	   cause some tracking to be wrong.  */
+	if (paradoxical_subreg_p (dest))
+	  continue;
+
  	elt = insert (dest, sets[i].src_elt,
  		      sets[i].dest_hash, GET_MODE (dest));



So apparently there was already an attempt of a fix for a similar bug,
and svn blame points to:

svn log -v -r8354
------------------------------------------------------------------------
r8354 | kenner | 1994-10-28 23:55:05 +0100 (Fri, 28 Oct 1994) | 3 lines
Changed paths:
    M /trunk/gcc/cse.c

(cse_insn): Don't record a DEST a paradoxical SUBREG and SRC is a
SIGN_EXTEND or ZERO_EXTEND.

------------------------------------------------------------------------

This way we can still map the underlying QI register to 255 but
not the SUBREG if it is a paradoxical subreg.

In the test case this patch still works (output code does not change).

What do you think?


Bernd.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-07-31 10:44             ` Bernd Edlinger
@ 2016-07-31 10:56               ` Bernd Edlinger
  2016-08-01 17:54               ` Jeff Law
  1 sibling, 0 replies; 25+ messages in thread
From: Bernd Edlinger @ 2016-07-31 10:56 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Jeff Law, GCC Patches, Eric Botcazou, Bernd Schmidt,
	Andreas Schwab, Tamar Christina

On 07/31/16 12:43, Bernd Edlinger wrote:
> I found out that the trouble starts one instruction earlier:
>
> (insn 19 40 20 2 (set (subreg:DI (reg:QI 93) 0)
>          ) pr.c:8 50 {*movdi_aarch64}
>       (expr_list:REG_DEAD (reg:DI 110 [ D.3037 ])
>          (nil)))
>

oops, my e-mail missed one line

that's what the insn actually looks like:

(insn 19 40 20 2 (set (subreg:DI (reg:QI 93) 0)
         (const_int 255 [0xff])) pr.c:8 50 {*movdi_aarch64}
      (expr_list:REG_DEAD (reg:DI 110 [ D.3037 ])
         (nil)))

So we can say reg:QI 93 is 255, but have no idea
what an expression like (subreg:DI (reg:QI 93) 0) will
evaluate to.


Bernd.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-07-30  8:18         ` Bernd Edlinger
  2016-07-30 11:39           ` Segher Boessenkool
@ 2016-08-01 17:16           ` Jeff Law
  1 sibling, 0 replies; 25+ messages in thread
From: Jeff Law @ 2016-08-01 17:16 UTC (permalink / raw)
  To: Bernd Edlinger, Segher Boessenkool
  Cc: GCC Patches, Eric Botcazou, Bernd Schmidt, Andreas Schwab,
	Tamar Christina

On 07/30/2016 02:17 AM, Bernd Edlinger wrote:
>>>
>>> In your first mail you showed reg 481 as _not_ being REG_POINTER:
>>>
>>> (insn 1047 1046 1048 (set (reg:DI 481)
>>>          (subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1
>>>       (nil))
>>>
>>> (note the lack of /f).  So which is it?  REG_POINTER here is not correct
>>> as far as I can see.
>>>
>>
>> Oh yes, that's an interesting point, in expand I still see this:
>>
>>
>> (insn 1047 1046 1048 (set (reg:DI 481)
>>          (subreg:DI (reg/f:SI 479) 0)) isl_input.c:2496 -1
>>       (nil))
>>
>> But in the last dump before combine I have this:
>>
>> (insn 1047 1044 1048 101 (set (reg/f:DI 481)
>>          (subreg:DI (reg/f:SI 545) 0)) isl_input.c:2496 50 {*movdi_aarch64}
>>       (nil))
>>
>>
>> However I was not really surpised by that, because the reg 545 does
>> in deed hold a pointer value: &isl_obj_map_vtable
So just an FYI.  It should always be safe to fail to mark something with 
REG_POINTER -- though it is possible that something has violated that 
design decision.

So one interesting test would be to hack up things so that REG_POINTER 
never gets set on anything and see what that does to your testcase.


>>
>> (insn 22 17 23 51 (set (reg/f:SI 544)
>>          (high:SI (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0]
>> <var_decl 0x7f83d3273f30 isl_obj_map_vtable>))) isl_input.c:2415 49
>> {*movsi_aarch64}
>>       (nil))
>> (insn 23 22 24 51 (set (reg/f:SI 545)
>>          (lo_sum:SI (reg/f:SI 544)
>>              (symbol_ref:SI ("isl_obj_map_vtable") [flags 0xc0]
>> <var_decl 0x7f83d3273f30 isl_obj_map_vtable>))) isl_input.c:2415 917
>> {add_losym_si}
>>       (expr_list:REG_DEAD (reg/f:SI 544)
>>          (expr_list:REG_EQUAL (symbol_ref:SI ("isl_obj_map_vtable")
>> [flags 0xc0] <var_decl 0x7f83d3273f30 isl_obj_map_vtable>)
>>              (nil))))
>>
>> The "reg/f:DI 481" first appeared in cse1.
>>
>>
>> I'll try to see what's happening there next....
>
>
> Ok, I the incorrect REG_POINTER is done here:
>
> cse_main -> reg_scan -> reg_scan_mark_refs -> set_reg_attrs_from_value
(reg 544) is technically not a pointer, though I think we have allowed 
REG_POINTER to be set on (HIGH (SYMBOL_REF)).  I would expect (reg 545) 
to be marked as a pointer.

The hesitation I have is because Pmode != SImode on this target, so 
technically the value has to be zero extended out to Pmode to ensure its 
validity.  One could argue that only a properly extended object should 
have REG_POINTER set.





>
> and here I see a bug, because if POINTERS_EXTEND_UNSIGNED
> can_be_reg_pointer is only set to false if x is SIGN_EXTEND but not
> if x is a SUBREG as in this case.
Seems like a bug to me.



>
> So I think that should be fixed this way:
>
> Index: emit-rtl.c
> ===================================================================
> --- emit-rtl.c	(revision 238891)
> +++ emit-rtl.c	(working copy)
> @@ -1155,7 +1155,7 @@ set_reg_attrs_from_value (rtx reg, rtx x)
>   	 || (GET_CODE (x) == SUBREG && subreg_lowpart_p (x)))
>       {
>   #if defined(POINTERS_EXTEND_UNSIGNED)
> -      if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
> +      if (((GET_CODE (x) != ZERO_EXTEND && POINTERS_EXTEND_UNSIGNED)
>   	   || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
>   	  && !targetm.have_ptr_extend ())
>   	can_be_reg_pointer = false;
>
>
> What do you think does this look like the right fix?
As Segher pointed out, I think you also want to look at 
SUBREG_PROMOTED_VAR_P and SUBREG_PROMOTED_UNSIGNED_P as well.  I don't 
think they're applicable for this target, but it still seems like the 
right thing to do.

>
> With this patch the code the reg/f:DI 481 does no longer appear,
> and also the invalid combine does no longer happen.
>
> However the test case from pr70903 does not get fixed by this.
>
> But when I look at the dumps, I see again the first incorrect
> transformation in cse2 (again cse!):
>
> (insn 20 19 21 2 (set (reg:DI 94)
>          (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64}
>       (expr_list:REG_EQUAL (const_int 255 [0xff])
>          (expr_list:REG_DEAD (reg:QI 93)
>              (nil))))
>
> but that is simply wrong, because later optimization passes
> expect reg 94 to be 0x000000ff but the upper bits are unspecified,
> so that REG_EQUAL should better not exist.
Now this one could be related to PROMOTE_MODE and friends.  You might 
want to review Jim W's comments in pr65932 which describe some problems 
with the way the port uses PROMOTE_MODE.


>
> When I looked at cse.c I saw a comment in #if 0, which exactly
> describes the problem that we have with paradoxical subreg here:
>
> Index: cse.c
> ===================================================================
> --- cse.c	(revision 238891)
> +++ cse.c	(working copy)
> @@ -4716,10 +4716,6 @@ cse_insn (rtx_insn *insn)
>   	    }
>   	}
>
> -#if 0
> -      /* It is no longer clear why we used to do this, but it doesn't
> -	 appear to still be needed.  So let's try without it since this
> -	 code hurts cse'ing widened ops.  */
>         /* If source is a paradoxical subreg (such as QI treated as an SI),
>   	 treat it as volatile.  It may do the work of an SI in one context
>   	 where the extra bits are not being used, but cannot replace an SI
> @@ -4726,7 +4722,6 @@ cse_insn (rtx_insn *insn)
>   	 in general.  */
>         if (paradoxical_subreg_p (src))
>   	sets[i].src_volatile = 1;
> -#endif
>
>         /* Locate all possible equivalent forms for SRC.  Try to replace
>            SRC in the insn with each cheaper equivalent.
>
>
> removing the #if 0 seems to fix the sample from pr70903, but generates
> 3 more instructions than with my initial patch:
Right.  I'd have to look at cse for a while, but ISTM that treating it 
like it was volatile is more of a pessimization than we want/need.



> When I looked at svn blame cse.c, I saw that this #if 0 was introduced
> here:
>
> svn log -r1614 -v
> ------------------------------------------------------------------------
> r1614 | kenner | 1992-07-17 11:57:24 +0200 (Fri, 17 Jul 1992) | 2 lines
> Changed paths:
>     M /trunk/gcc/cse.c
>     M /trunk/gcc/libgcc2.c
>     M /trunk/gcc/reload.c
>     M /trunk/gcc/reload1.c
>
> *** empty log message ***
>
> ------------------------------------------------------------------------
>
>
> Again, a check-in without comment, that fixes at least 3 different
> bugs, without test case, and of course the gcc-patches archives do only
> go back to 1998...
Different era.  While I could go back and look at the archives for the 
private development list that existed at that time, it's highly likely 
there was no patch posted or discussed -- and there wasn't any kind of a 
regression testsuite at the time either (c-torture as a separate package 
from Tege showed up in 1993 IIRC).

Jeff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-07-31 10:44             ` Bernd Edlinger
  2016-07-31 10:56               ` Bernd Edlinger
@ 2016-08-01 17:54               ` Jeff Law
  2016-08-01 18:53                 ` Bernd Edlinger
  1 sibling, 1 reply; 25+ messages in thread
From: Jeff Law @ 2016-08-01 17:54 UTC (permalink / raw)
  To: Bernd Edlinger, Segher Boessenkool
  Cc: GCC Patches, Eric Botcazou, Bernd Schmidt, Andreas Schwab,
	Tamar Christina

On 07/31/2016 04:44 AM, Bernd Edlinger wrote:
>
> like this?
>
> Index: emit-rtl.c
> ===================================================================
> --- emit-rtl.c	(revision 238891)
> +++ emit-rtl.c	(working copy)
> @@ -1156,7 +1156,11 @@
>       {
>   #if defined(POINTERS_EXTEND_UNSIGNED)
>         if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
> -	   || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
> +	   || (GET_CODE (x) == ZERO_EXTEND && ! POINTERS_EXTEND_UNSIGNED)
> +	   || (paradoxical_subreg_p (x)
> +	       && ! (SUBREG_PROMOTED_VAR_P (x)
> +		     && SUBREG_CHECK_PROMOTED_SIGN (x,
> +						    POINTERS_EXTEND_UNSIGNED))))
>   	  && !targetm.have_ptr_extend ())
>   	can_be_reg_pointer = false;
>   #endif
>
> In the test case of pr71779 the subreg is no promoted var, so this
> has no influence at this time.  Also I have not POINTERS_EXTEND_SIGNED
> target, but for the symmetry it ought to check explicitly for
> ZERO_EXTEND as well, and allow the pointer value to pass thru a
> TRUNCATE.
I believe MIPS is likely the only target that extends signed, a few need 
special extension code (ia64/s390).  But this looks reasonable to me.  I 
don't think it's worth the complication of dealing with truncation.


>>
>
> I debugged the cse again, to see how it works and why it mis-compiles
> this example.
>
> I found out that the trouble starts one instruction earlier:
>
[ Fixing the missing bits from the insn using your later message... ]

(insn 19 40 20 2 (set (subreg:DI (reg:QI 93) 0)
          (const_int 255 [0xff])) pr.c:8 50 {*movdi_aarch64}
       (expr_list:REG_DEAD (reg:DI 110 [ D.3037 ])
          (nil)))


>
> cse_main sees the constant value and maps:
> (reg:QI 93)  =>  (const_int 255 [0xff])
OK.  This looks OK to me.  We know unambiguously that (reg 93) has the 
value 0xff -- that's because (reg 93) is a QImode register.   There are 
no outside QImode.



>
> plus (I mean that is wrong):
> (subreg:DI (reg:QI 93) 0)  =>  (const_int 255 [0xff])
>
> When the next insn is scanned
>
> (insn 20 19 21 2 (set (reg:DI 94)
>          (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64}
>       (expr_list:REG_DEAD (reg:QI 93)
>          (nil)))
>
> I see fold_rtx (subreg:DI (reg:QI 93) 0))
> return (const_int 255 [0xff]) which is wrong.
I think this is the key point where things have gone wrong.  While we 
know the QImode bits are 0xff the bits outside QImode are undefined.  So 
we can't legitimately return 0xff when folding that rtx.



>
> now cse maps:
> (reg:DI 94)  =>  (const_int 255 [0xff])
And now we propagate the mistake from the previous step....




>
> Now I think I found a better place for a patch, where the first bogus
> mapping is recorded:
>
> Index: cse.c
> ===================================================================
> --- cse.c	(revision 238891)
> +++ cse.c	(working copy)
> @@ -5898,15 +5898,7 @@ cse_insn (rtx_insn *insn)
>   	    || GET_MODE (dest) == BLKmode
>   	    /* If we didn't put a REG_EQUAL value or a source into the hash
>   	       table, there is no point is recording DEST.  */
> -	    || sets[i].src_elt == 0
> -	    /* If DEST is a paradoxical SUBREG and SRC is a ZERO_EXTEND
> -	       or SIGN_EXTEND, don't record DEST since it can cause
> -	       some tracking to be wrong.
> -
> -	       ??? Think about this more later.  */
> -	    || (paradoxical_subreg_p (dest)
> -		&& (GET_CODE (sets[i].src) == SIGN_EXTEND
> -		    || GET_CODE (sets[i].src) == ZERO_EXTEND)))
> +	    || sets[i].src_elt == 0)
>   	  continue;
>
>   	/* STRICT_LOW_PART isn't part of the value BEING set,
> @@ -5925,6 +5917,11 @@ cse_insn (rtx_insn *insn)
>   	      sets[i].dest_hash = HASH (dest, GET_MODE (dest));
>   	    }
>
> +	/* If DEST is a paradoxical SUBREG, don't record DEST since it can
> +	   cause some tracking to be wrong.  */
> +	if (paradoxical_subreg_p (dest))
> +	  continue;
> +
>   	elt = insert (dest, sets[i].src_elt,
>   		      sets[i].dest_hash, GET_MODE (dest));
Instead of saying "cause some tracking to be wrong", it might be better 
to say "the bits outside the mode of GET_MODE (SUBREG_REG (dest)) are 
undefined".


>
>
>
> So apparently there was already an attempt of a fix for a similar bug,
> and svn blame points to:
>
> svn log -v -r8354
> ------------------------------------------------------------------------
> r8354 | kenner | 1994-10-28 23:55:05 +0100 (Fri, 28 Oct 1994) | 3 lines
> Changed paths:
>     M /trunk/gcc/cse.c
>
> (cse_insn): Don't record a DEST a paradoxical SUBREG and SRC is a
> SIGN_EXTEND or ZERO_EXTEND.
>
> ------------------------------------------------------------------------
>
> This way we can still map the underlying QI register to 255 but
> not the SUBREG if it is a paradoxical subreg.
>
> In the test case this patch still works (output code does not change).
>
> What do you think?
Looks like you've probably nailed it.  It'll be interesting see if 
there's any fallout (though our RTL optimizer testing is pretty weak, so 
even if there were, I doubt we'd catch it).

ISTM that this would be a great test for the unit testing framework that 
David M. has been building.   You should look for David's simplify-rtx 
tests (posted, but not committed pending some thinking about a few 
issues) as a guide.

jeff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-08-01 17:54               ` Jeff Law
@ 2016-08-01 18:53                 ` Bernd Edlinger
  2016-08-01 23:31                   ` Segher Boessenkool
                                     ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Bernd Edlinger @ 2016-08-01 18:53 UTC (permalink / raw)
  To: Jeff Law, Segher Boessenkool
  Cc: GCC Patches, Eric Botcazou, Bernd Schmidt, Andreas Schwab,
	Tamar Christina

[-- Attachment #1: Type: text/plain, Size: 1221 bytes --]

Hi Jeff,

On 08/01/16 19:54, Jeff Law wrote:
> Looks like you've probably nailed it.  It'll be interesting see if
> there's any fallout (though our RTL optimizer testing is pretty weak, so
> even if there were, I doubt we'd catch it).
>

If there is, it will probably a performance regression...

Anyway I'd say these two patches do just disable actually wrong
transformations.  So here are both patches as separate diffs
with your suggestion for the comment in cse_insn.

I believe that on x86_64 both patches do not change a single bit.

However I think there are more paradoxical subregs generated all over,
but the aarch64 insv code pattern did trigger more hidden bugs than
any other port.  It is certainly unfortunate that the major source
of paradoxical subreg is in a target-dependent code path :(

Please apologize that I am not able to reduce/finalize the aarch64 test
case at this time, as I usually only work with arm and intel targets, 
but I made an exception here, because a bug like that may affect all
targets sooner or later.


Boot-strap and reg-testing on x86_64-linux-gnu.
Plus aarch64 bootstrap and isl-testing by Andreas.


Is it OK for trunk?



Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-pr70903.diff --]
[-- Type: text/x-patch; name="patch-pr70903.diff", Size: 1432 bytes --]

2016-08-01  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR rtl-optimization/70903
	* cse.c (cse_insn): If DEST is a paradoxical SUBREG, don't record DEST.

Index: gcc/cse.c
===================================================================
--- gcc/cse.c	(revision 238915)
+++ gcc/cse.c	(working copy)
@@ -5898,15 +5898,7 @@ cse_insn (rtx_insn *insn)
 	    || GET_MODE (dest) == BLKmode
 	    /* If we didn't put a REG_EQUAL value or a source into the hash
 	       table, there is no point is recording DEST.  */
-	    || sets[i].src_elt == 0
-	    /* If DEST is a paradoxical SUBREG and SRC is a ZERO_EXTEND
-	       or SIGN_EXTEND, don't record DEST since it can cause
-	       some tracking to be wrong.
-
-	       ??? Think about this more later.  */
-	    || (paradoxical_subreg_p (dest)
-		&& (GET_CODE (sets[i].src) == SIGN_EXTEND
-		    || GET_CODE (sets[i].src) == ZERO_EXTEND)))
+	    || sets[i].src_elt == 0)
 	  continue;
 
 	/* STRICT_LOW_PART isn't part of the value BEING set,
@@ -5925,6 +5917,11 @@ cse_insn (rtx_insn *insn)
 	      sets[i].dest_hash = HASH (dest, GET_MODE (dest));
 	    }
 
+	/* If DEST is a paradoxical SUBREG, don't record DEST since the bits
+	   outside the mode of GET_MODE (SUBREG_REG (dest)) are undefined.  */
+	if (paradoxical_subreg_p (dest))
+	  continue;
+
 	elt = insert (dest, sets[i].src_elt,
 		      sets[i].dest_hash, GET_MODE (dest));
 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: patch-pr71779.diff --]
[-- Type: text/x-patch; name="patch-pr71779.diff", Size: 984 bytes --]

2016-08-01  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR rtl-optimization/71779
	* emit-rtl.c (set_reg_attrs_from_value): Only propagate REG_POINTER,
	if the value was sign-extended according to POINTERS_EXTEND_UNSIGNED
	or if it was truncated.

Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	(revision 238915)
+++ gcc/emit-rtl.c	(working copy)
@@ -1156,7 +1156,11 @@ set_reg_attrs_from_value (rtx reg, rtx x)
     {
 #if defined(POINTERS_EXTEND_UNSIGNED)
       if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
-	   || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
+	   || (GET_CODE (x) == ZERO_EXTEND && ! POINTERS_EXTEND_UNSIGNED)
+	   || (paradoxical_subreg_p (x)
+	       && ! (SUBREG_PROMOTED_VAR_P (x)
+		     && SUBREG_CHECK_PROMOTED_SIGN (x,
+						    POINTERS_EXTEND_UNSIGNED))))
 	  && !targetm.have_ptr_extend ())
 	can_be_reg_pointer = false;
 #endif

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-08-01 18:53                 ` Bernd Edlinger
@ 2016-08-01 23:31                   ` Segher Boessenkool
  2016-08-02 15:21                     ` Jeff Law
  2016-08-03 14:25                   ` Bernd Edlinger
  2016-08-03 15:38                   ` Jeff Law
  2 siblings, 1 reply; 25+ messages in thread
From: Segher Boessenkool @ 2016-08-01 23:31 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: Jeff Law, GCC Patches, Eric Botcazou, Bernd Schmidt,
	Andreas Schwab, Tamar Christina

Hi,

On Mon, Aug 01, 2016 at 06:52:54PM +0000, Bernd Edlinger wrote:
> On 08/01/16 19:54, Jeff Law wrote:
> > Looks like you've probably nailed it.  It'll be interesting see if
> > there's any fallout (though our RTL optimizer testing is pretty weak, so
> > even if there were, I doubt we'd catch it).
> 
> If there is, it will probably a performance regression...

I tested building Linux with and without the patch, on many archs.
The few that show differences are:

       alpha   6148872   6148776
        ia64  16946958  16946670
        s390  12345770  12345850
        tile  12016086  12016070

(left before, right after; arm and aarch64 did not build, kernel problems).

So all except s390 generate smaller code even.

> However I think there are more paradoxical subregs generated all over,
> but the aarch64 insv code pattern did trigger more hidden bugs than
> any other port.  It is certainly unfortunate that the major source
> of paradoxical subreg is in a target-dependent code path :(

It is certainly unfortunate that paradoxical subregs exist at all!  :-)


Segher

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-08-01 23:31                   ` Segher Boessenkool
@ 2016-08-02 15:21                     ` Jeff Law
  2016-08-02 16:46                       ` Segher Boessenkool
  2016-08-02 17:46                       ` Richard Biener
  0 siblings, 2 replies; 25+ messages in thread
From: Jeff Law @ 2016-08-02 15:21 UTC (permalink / raw)
  To: Segher Boessenkool, Bernd Edlinger
  Cc: GCC Patches, Eric Botcazou, Bernd Schmidt, Andreas Schwab,
	Tamar Christina

On 08/01/2016 05:31 PM, Segher Boessenkool wrote:
> Hi,
>
> On Mon, Aug 01, 2016 at 06:52:54PM +0000, Bernd Edlinger wrote:
>> On 08/01/16 19:54, Jeff Law wrote:
>>> Looks like you've probably nailed it.  It'll be interesting see if
>>> there's any fallout (though our RTL optimizer testing is pretty weak, so
>>> even if there were, I doubt we'd catch it).
>>
>> If there is, it will probably a performance regression...
>
> I tested building Linux with and without the patch, on many archs.
> The few that show differences are:
>
>        alpha   6148872   6148776
>         ia64  16946958  16946670
>         s390  12345770  12345850
>         tile  12016086  12016070
>
> (left before, right after; arm and aarch64 did not build, kernel problems).
>
> So all except s390 generate smaller code even.
They're all deep enough in the noise that I wouldn't care either way :-)

>
>> However I think there are more paradoxical subregs generated all over,
>> but the aarch64 insv code pattern did trigger more hidden bugs than
>> any other port.  It is certainly unfortunate that the major source
>> of paradoxical subreg is in a target-dependent code path :(
>
> It is certainly unfortunate that paradoxical subregs exist at all!  :-)
Yea.  It probably seemed like a good idea 25-30 years ago, but I always 
cringe when I see them being used.  Yea it gives the compiler some more 
freedom, but more often than not I think we'd be better off with real 
extensions.

jeff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-08-02 15:21                     ` Jeff Law
@ 2016-08-02 16:46                       ` Segher Boessenkool
  2016-08-02 20:16                         ` Bernd Schmidt
  2016-08-02 17:46                       ` Richard Biener
  1 sibling, 1 reply; 25+ messages in thread
From: Segher Boessenkool @ 2016-08-02 16:46 UTC (permalink / raw)
  To: Jeff Law
  Cc: Bernd Edlinger, GCC Patches, Eric Botcazou, Bernd Schmidt,
	Andreas Schwab, Tamar Christina

On Tue, Aug 02, 2016 at 09:21:34AM -0600, Jeff Law wrote:
> >>However I think there are more paradoxical subregs generated all over,
> >>but the aarch64 insv code pattern did trigger more hidden bugs than
> >>any other port.  It is certainly unfortunate that the major source
> >>of paradoxical subreg is in a target-dependent code path :(
> >
> >It is certainly unfortunate that paradoxical subregs exist at all!  :-)
> Yea.  It probably seemed like a good idea 25-30 years ago, but I always 
> cringe when I see them being used.  Yea it gives the compiler some more 
> freedom, but more often than not I think we'd be better off with real 
> extensions.

And then perhaps have some bits marked as "do not care", perhaps using
a register note...  This would help other cases as well.


Segher

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-08-02 15:21                     ` Jeff Law
  2016-08-02 16:46                       ` Segher Boessenkool
@ 2016-08-02 17:46                       ` Richard Biener
  2016-08-02 18:05                         ` Jeff Law
  1 sibling, 1 reply; 25+ messages in thread
From: Richard Biener @ 2016-08-02 17:46 UTC (permalink / raw)
  To: Jeff Law, Segher Boessenkool, Bernd Edlinger
  Cc: GCC Patches, Eric Botcazou, Bernd Schmidt, Andreas Schwab,
	Tamar Christina

On August 2, 2016 5:21:34 PM GMT+02:00, Jeff Law <law@redhat.com> wrote:
>On 08/01/2016 05:31 PM, Segher Boessenkool wrote:
>> Hi,
>>
>> On Mon, Aug 01, 2016 at 06:52:54PM +0000, Bernd Edlinger wrote:
>>> On 08/01/16 19:54, Jeff Law wrote:
>>>> Looks like you've probably nailed it.  It'll be interesting see if
>>>> there's any fallout (though our RTL optimizer testing is pretty
>weak, so
>>>> even if there were, I doubt we'd catch it).
>>>
>>> If there is, it will probably a performance regression...
>>
>> I tested building Linux with and without the patch, on many archs.
>> The few that show differences are:
>>
>>        alpha   6148872   6148776
>>         ia64  16946958  16946670
>>         s390  12345770  12345850
>>         tile  12016086  12016070
>>
>> (left before, right after; arm and aarch64 did not build, kernel
>problems).
>>
>> So all except s390 generate smaller code even.
>They're all deep enough in the noise that I wouldn't care either way
>:-)
>
>>
>>> However I think there are more paradoxical subregs generated all
>over,
>>> but the aarch64 insv code pattern did trigger more hidden bugs than
>>> any other port.  It is certainly unfortunate that the major source
>>> of paradoxical subreg is in a target-dependent code path :(
>>
>> It is certainly unfortunate that paradoxical subregs exist at all! 
>:-)
>Yea.  It probably seemed like a good idea 25-30 years ago, but I always
>
>cringe when I see them being used.  Yea it gives the compiler some more
>
>freedom, but more often than not I think we'd be better off with real 
>extensions.

But we love to exploit undefined behavior elsewhere, too.  Now the init-regs pass comes to my mind again (papering over issues elsewhere)..

Richard.


>jeff


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-08-02 17:46                       ` Richard Biener
@ 2016-08-02 18:05                         ` Jeff Law
  0 siblings, 0 replies; 25+ messages in thread
From: Jeff Law @ 2016-08-02 18:05 UTC (permalink / raw)
  To: Richard Biener, Segher Boessenkool, Bernd Edlinger
  Cc: GCC Patches, Eric Botcazou, Bernd Schmidt, Andreas Schwab,
	Tamar Christina

On 08/02/2016 11:46 AM, Richard Biener wrote:
> But we love to exploit undefined behavior elsewhere, too.  Now the
> init-regs pass comes to my mind again (papering over issues
> elsewhere)..
True.  I just haven't seen that the don't care bits created by 
paradoxical subregs has actually been all that useful in practice.

In fact, I've seen subregs get in the way often, but if the code had a 
normal extension it would have been optimized better.

Jeff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-08-02 16:46                       ` Segher Boessenkool
@ 2016-08-02 20:16                         ` Bernd Schmidt
  2016-08-03 15:13                           ` Jeff Law
  0 siblings, 1 reply; 25+ messages in thread
From: Bernd Schmidt @ 2016-08-02 20:16 UTC (permalink / raw)
  To: Segher Boessenkool, Jeff Law
  Cc: Bernd Edlinger, GCC Patches, Eric Botcazou, Andreas Schwab,
	Tamar Christina

On 08/02/2016 06:46 PM, Segher Boessenkool wrote:
> On Tue, Aug 02, 2016 at 09:21:34AM -0600, Jeff Law wrote:
>>>> However I think there are more paradoxical subregs generated all over,
>>>> but the aarch64 insv code pattern did trigger more hidden bugs than
>>>> any other port.  It is certainly unfortunate that the major source
>>>> of paradoxical subreg is in a target-dependent code path :(
>>>
>>> It is certainly unfortunate that paradoxical subregs exist at all!  :-)
>> Yea.  It probably seemed like a good idea 25-30 years ago, but I always
>> cringe when I see them being used.  Yea it gives the compiler some more
>> freedom, but more often than not I think we'd be better off with real
>> extensions.
>
> And then perhaps have some bits marked as "do not care", perhaps using
> a register note...  This would help other cases as well.

I'm thinking maybe an any_extend code to go along with sign_extend and 
zero_extend. If input and output registers are the same it would be 
treated like a no-op move. That might be close enough to get us the 
benefits of a paradoxical subreg.


Bernd

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-08-01 18:53                 ` Bernd Edlinger
  2016-08-01 23:31                   ` Segher Boessenkool
@ 2016-08-03 14:25                   ` Bernd Edlinger
  2016-08-03 15:38                   ` Jeff Law
  2 siblings, 0 replies; 25+ messages in thread
From: Bernd Edlinger @ 2016-08-03 14:25 UTC (permalink / raw)
  To: Jeff Law, Segher Boessenkool
  Cc: GCC Patches, Eric Botcazou, Bernd Schmidt, Andreas Schwab,
	Tamar Christina

Hi,

Is it OK for the trunk?

I guess so, but need an explicit OK.


Thanks
Bernd.

On 08/01/16 20:52, Bernd Edlinger wrote:
> Hi Jeff,
>
> On 08/01/16 19:54, Jeff Law wrote:
>> Looks like you've probably nailed it.  It'll be interesting see if
>> there's any fallout (though our RTL optimizer testing is pretty weak, so
>> even if there were, I doubt we'd catch it).
>>
>
> If there is, it will probably a performance regression...
>
> Anyway I'd say these two patches do just disable actually wrong
> transformations.  So here are both patches as separate diffs
> with your suggestion for the comment in cse_insn.
>
> I believe that on x86_64 both patches do not change a single bit.
>
> However I think there are more paradoxical subregs generated all over,
> but the aarch64 insv code pattern did trigger more hidden bugs than
> any other port.  It is certainly unfortunate that the major source
> of paradoxical subreg is in a target-dependent code path :(
>
> Please apologize that I am not able to reduce/finalize the aarch64 test
> case at this time, as I usually only work with arm and intel targets,
> but I made an exception here, because a bug like that may affect all
> targets sooner or later.
>
>
> Boot-strap and reg-testing on x86_64-linux-gnu.
> Plus aarch64 bootstrap and isl-testing by Andreas.
>
>
> Is it OK for trunk?
>
>
>
> Thanks
> Bernd.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-08-02 20:16                         ` Bernd Schmidt
@ 2016-08-03 15:13                           ` Jeff Law
  0 siblings, 0 replies; 25+ messages in thread
From: Jeff Law @ 2016-08-03 15:13 UTC (permalink / raw)
  To: Bernd Schmidt, Segher Boessenkool
  Cc: Bernd Edlinger, GCC Patches, Eric Botcazou, Andreas Schwab,
	Tamar Christina

On 08/02/2016 02:16 PM, Bernd Schmidt wrote:
> On 08/02/2016 06:46 PM, Segher Boessenkool wrote:
>> On Tue, Aug 02, 2016 at 09:21:34AM -0600, Jeff Law wrote:
>>>>> However I think there are more paradoxical subregs generated all over,
>>>>> but the aarch64 insv code pattern did trigger more hidden bugs than
>>>>> any other port.  It is certainly unfortunate that the major source
>>>>> of paradoxical subreg is in a target-dependent code path :(
>>>>
>>>> It is certainly unfortunate that paradoxical subregs exist at all!  :-)
>>> Yea.  It probably seemed like a good idea 25-30 years ago, but I always
>>> cringe when I see them being used.  Yea it gives the compiler some more
>>> freedom, but more often than not I think we'd be better off with real
>>> extensions.
>>
>> And then perhaps have some bits marked as "do not care", perhaps using
>> a register note...  This would help other cases as well.
>
> I'm thinking maybe an any_extend code to go along with sign_extend and
> zero_extend. If input and output registers are the same it would be
> treated like a no-op move. That might be close enough to get us the
> benefits of a paradoxical subreg.
I was kind of thinking along the same lines.  Not high priority though.

Jeff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-08-01 18:53                 ` Bernd Edlinger
  2016-08-01 23:31                   ` Segher Boessenkool
  2016-08-03 14:25                   ` Bernd Edlinger
@ 2016-08-03 15:38                   ` Jeff Law
  2016-08-03 17:41                     ` Bernd Edlinger
  2 siblings, 1 reply; 25+ messages in thread
From: Jeff Law @ 2016-08-03 15:38 UTC (permalink / raw)
  To: Bernd Edlinger, Segher Boessenkool
  Cc: GCC Patches, Eric Botcazou, Bernd Schmidt, Andreas Schwab,
	Tamar Christina

On 08/01/2016 12:52 PM, Bernd Edlinger wrote:
> Hi Jeff,
>
> On 08/01/16 19:54, Jeff Law wrote:
>> > Looks like you've probably nailed it.  It'll be interesting see if
>> > there's any fallout (though our RTL optimizer testing is pretty weak, so
>> > even if there were, I doubt we'd catch it).
>> >
> If there is, it will probably a performance regression...
>
> Anyway I'd say these two patches do just disable actually wrong
> transformations.  So here are both patches as separate diffs
> with your suggestion for the comment in cse_insn.
>
> I believe that on x86_64 both patches do not change a single bit.
>
> However I think there are more paradoxical subregs generated all over,
> but the aarch64 insv code pattern did trigger more hidden bugs than
> any other port.  It is certainly unfortunate that the major source
> of paradoxical subreg is in a target-dependent code path :(
>
> Please apologize that I am not able to reduce/finalize the aarch64 test
> case at this time, as I usually only work with arm and intel targets,
> but I made an exception here, because a bug like that may affect all
> targets sooner or later.
>
>
> Boot-strap and reg-testing on x86_64-linux-gnu.
> Plus aarch64 bootstrap and isl-testing by Andreas.
>
>
> Is it OK for trunk?
cse.c changes look good, but I'd really like to see a testcase for each 
issue in the dejagnu framework.  Extra points if you tried to build a 
unit test using David M's framework, but that isn't required.

The testcase from 70903 ought to be trivial to add to the dejagnu suite. 
  71779 might be more difficult, but if you could take a stab, it'd be 
appreciated.

jeff



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-08-03 15:38                   ` Jeff Law
@ 2016-08-03 17:41                     ` Bernd Edlinger
  2016-08-03 22:08                       ` Jeff Law
  0 siblings, 1 reply; 25+ messages in thread
From: Bernd Edlinger @ 2016-08-03 17:41 UTC (permalink / raw)
  To: Jeff Law, Segher Boessenkool
  Cc: GCC Patches, Eric Botcazou, Bernd Schmidt, Andreas Schwab,
	Tamar Christina

[-- Attachment #1: Type: text/plain, Size: 1653 bytes --]

On 08/03/16 17:38, Jeff Law wrote:
> cse.c changes look good, but I'd really like to see a testcase for each
> issue in the dejagnu framework.  Extra points if you tried to build a
> unit test using David M's framework, but that isn't required.
>
> The testcase from 70903 ought to be trivial to add to the dejagnu suite.
>   71779 might be more difficult, but if you could take a stab, it'd be
> appreciated.
>


Yes, sure.  I had assumed that the pr70903 test case is using some
target-specific vector types, but now I see that it even works as-is in
the gcc.c-torture/execute directory.

So I've added the test case to the cse patch.  And quickly verified that
it works on x86_64-linux-gnu.


The pr71779 test case will be pretty difficult to reduce, because it
depends on combine to do the incorrect transformation and lra to spill
the subreg, and on the stack content at runtime to be non-zero.

But technically it *is* already in the isl-test suite, so if isl is
in-tree, it is always executed by make check or make check-isl.

It is just that gmp/mpfr/mpc and isl test results are not included by
contrib/test_summary, but that should be fixable.  What do you think?

Actually that should not be too difficult, as there are test-suite.log
files that we could just added to the test_summary output as-is, for
instance:

cat isl/test-suite.log

==================================
    isl 0.16.1: ./test-suite.log
==================================

# TOTAL: 5
# PASS:  5
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2


Are the patches OK now?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-pr70903.diff --]
[-- Type: text/x-patch; name="patch-pr70903.diff", Size: 2431 bytes --]

2016-08-01  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR rtl-optimization/70903
	* cse.c (cse_insn): If DEST is a paradoxical SUBREG, don't record DEST.

testsuite:
2016-08-01  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR rtl-optimization/70903
	* gcc.c-torture/execute/pr70903.c: New test.

Index: gcc/cse.c
===================================================================
--- gcc/cse.c	(revision 238915)
+++ gcc/cse.c	(working copy)
@@ -5898,15 +5898,7 @@ cse_insn (rtx_insn *insn)
 	    || GET_MODE (dest) == BLKmode
 	    /* If we didn't put a REG_EQUAL value or a source into the hash
 	       table, there is no point is recording DEST.  */
-	    || sets[i].src_elt == 0
-	    /* If DEST is a paradoxical SUBREG and SRC is a ZERO_EXTEND
-	       or SIGN_EXTEND, don't record DEST since it can cause
-	       some tracking to be wrong.
-
-	       ??? Think about this more later.  */
-	    || (paradoxical_subreg_p (dest)
-		&& (GET_CODE (sets[i].src) == SIGN_EXTEND
-		    || GET_CODE (sets[i].src) == ZERO_EXTEND)))
+	    || sets[i].src_elt == 0)
 	  continue;
 
 	/* STRICT_LOW_PART isn't part of the value BEING set,
@@ -5925,6 +5917,11 @@ cse_insn (rtx_insn *insn)
 	      sets[i].dest_hash = HASH (dest, GET_MODE (dest));
 	    }
 
+	/* If DEST is a paradoxical SUBREG, don't record DEST since the bits
+	   outside the mode of GET_MODE (SUBREG_REG (dest)) are undefined.  */
+	if (paradoxical_subreg_p (dest))
+	  continue;
+
 	elt = insert (dest, sets[i].src_elt,
 		      sets[i].dest_hash, GET_MODE (dest));
 
Index: gcc/testsuite/gcc.c-torture/execute/pr70903.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/pr70903.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/pr70903.c	(working copy)
@@ -0,0 +1,19 @@
+typedef unsigned char V8 __attribute__ ((vector_size (32)));
+typedef unsigned int V32 __attribute__ ((vector_size (32)));
+typedef unsigned long long V64 __attribute__ ((vector_size (32)));
+
+static V32 __attribute__ ((noinline, noclone))
+foo (V64 x)
+{
+  V64 y = (V64)(V8){((V8)(V64){65535, x[0]})[1]};
+  return (V32){y[0], 255};
+}
+
+int main ()
+{
+  V32 x = foo ((V64){});
+//  __builtin_printf ("%08x %08x %08x %08x %08x %08x %08x %08x\n", x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7]);
+  if (x[1] != 255)
+    __builtin_abort();
+  return 0;
+}

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: patch-pr71779.diff --]
[-- Type: text/x-patch; name="patch-pr71779.diff", Size: 984 bytes --]

2016-08-01  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR rtl-optimization/71779
	* emit-rtl.c (set_reg_attrs_from_value): Only propagate REG_POINTER,
	if the value was sign-extended according to POINTERS_EXTEND_UNSIGNED
	or if it was truncated.

Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	(revision 238915)
+++ gcc/emit-rtl.c	(working copy)
@@ -1156,7 +1156,11 @@ set_reg_attrs_from_value (rtx reg, rtx x)
     {
 #if defined(POINTERS_EXTEND_UNSIGNED)
       if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
-	   || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
+	   || (GET_CODE (x) == ZERO_EXTEND && ! POINTERS_EXTEND_UNSIGNED)
+	   || (paradoxical_subreg_p (x)
+	       && ! (SUBREG_PROMOTED_VAR_P (x)
+		     && SUBREG_CHECK_PROMOTED_SIGN (x,
+						    POINTERS_EXTEND_UNSIGNED))))
 	  && !targetm.have_ptr_extend ())
 	can_be_reg_pointer = false;
 #endif

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-08-03 17:41                     ` Bernd Edlinger
@ 2016-08-03 22:08                       ` Jeff Law
  2016-08-04 16:51                         ` James Greenhalgh
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff Law @ 2016-08-03 22:08 UTC (permalink / raw)
  To: Bernd Edlinger, Segher Boessenkool
  Cc: GCC Patches, Eric Botcazou, Bernd Schmidt, Andreas Schwab,
	Tamar Christina

On 08/03/2016 11:41 AM, Bernd Edlinger wrote:
> On 08/03/16 17:38, Jeff Law wrote:
>> cse.c changes look good, but I'd really like to see a testcase for each
>> issue in the dejagnu framework.  Extra points if you tried to build a
>> unit test using David M's framework, but that isn't required.
>>
>> The testcase from 70903 ought to be trivial to add to the dejagnu suite.
>>   71779 might be more difficult, but if you could take a stab, it'd be
>> appreciated.
>>
>
>
> Yes, sure.  I had assumed that the pr70903 test case is using some
> target-specific vector types, but now I see that it even works as-is in
> the gcc.c-torture/execute directory.
>
> So I've added the test case to the cse patch.  And quickly verified that
> it works on x86_64-linux-gnu.
>
>
> The pr71779 test case will be pretty difficult to reduce, because it
> depends on combine to do the incorrect transformation and lra to spill
> the subreg, and on the stack content at runtime to be non-zero.
>
> But technically it *is* already in the isl-test suite, so if isl is
> in-tree, it is always executed by make check or make check-isl.
>
> It is just that gmp/mpfr/mpc and isl test results are not included by
> contrib/test_summary, but that should be fixable.  What do you think?
>
> Actually that should not be too difficult, as there are test-suite.log
> files that we could just added to the test_summary output as-is, for
> instance:
>
> cat isl/test-suite.log
>
> ==================================
>     isl 0.16.1: ./test-suite.log
> ==================================
>
> # TOTAL: 5
> # PASS:  5
> # SKIP:  0
> # XFAIL: 0
> # FAIL:  0
> # XPASS: 0
> # ERROR: 0
>
> .. contents:: :depth: 2
>
>
> Are the patches OK now?
Yes.  Thanks for taking care of this...

Jeff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg
  2016-08-03 22:08                       ` Jeff Law
@ 2016-08-04 16:51                         ` James Greenhalgh
  0 siblings, 0 replies; 25+ messages in thread
From: James Greenhalgh @ 2016-08-04 16:51 UTC (permalink / raw)
  To: Jeff Law
  Cc: Bernd Edlinger, Segher Boessenkool, GCC Patches, Eric Botcazou,
	Bernd Schmidt, Andreas Schwab, Tamar Christina, nd

On Wed, Aug 03, 2016 at 04:08:30PM -0600, Jeff Law wrote:
> On 08/03/2016 11:41 AM, Bernd Edlinger wrote:
> >On 08/03/16 17:38, Jeff Law wrote:
> >>cse.c changes look good, but I'd really like to see a testcase for each
> >>issue in the dejagnu framework.  Extra points if you tried to build a
> >>unit test using David M's framework, but that isn't required.
> >>
> >>The testcase from 70903 ought to be trivial to add to the dejagnu suite.
> >>  71779 might be more difficult, but if you could take a stab, it'd be
> >>appreciated.
> >>
> >
> >
> >Yes, sure.  I had assumed that the pr70903 test case is using some
> >target-specific vector types, but now I see that it even works as-is in
> >the gcc.c-torture/execute directory.
> >
> >So I've added the test case to the cse patch.  And quickly verified that
> >it works on x86_64-linux-gnu.
> >
> >
> >The pr71779 test case will be pretty difficult to reduce, because it
> >depends on combine to do the incorrect transformation and lra to spill
> >the subreg, and on the stack content at runtime to be non-zero.
> >
> >But technically it *is* already in the isl-test suite, so if isl is
> >in-tree, it is always executed by make check or make check-isl.
> >
> >It is just that gmp/mpfr/mpc and isl test results are not included by
> >contrib/test_summary, but that should be fixable.  What do you think?
> >
> >Actually that should not be too difficult, as there are test-suite.log
> >files that we could just added to the test_summary output as-is, for
> >instance:
> >
> >cat isl/test-suite.log
> >
> >==================================
> >    isl 0.16.1: ./test-suite.log
> >==================================
> >
> ># TOTAL: 5
> ># PASS:  5
> ># SKIP:  0
> ># XFAIL: 0
> ># FAIL:  0
> ># XPASS: 0
> ># ERROR: 0
> >
> >.. contents:: :depth: 2
> >
> >
> >Are the patches OK now?
> Yes.  Thanks for taking care of this...
> 

Hi Bernd,

Thanks for fixing this, but it looks like you accidentally double-added
the pr70903.c testcase.

  Failures:
	gcc.c-torture/execute/pr70903.c

  Bisected to:

  Author: edlinger <edlinger@138bc75d-0d04-0410-961f-82ee72b054a4>
  Date:   Thu Aug 4 13:20:57 2016 +0000

    2016-08-04  Bernd Edlinger  <bernd.edlinger@hotmail.de>

            PR rtl-optimization/70903
            * cse.c (cse_insn): If DEST is a paradoxical SUBREG, don't record DEST.

    testsuite:
    2016-08-04  Bernd Edlinger  <bernd.edlinger@hotmail.de>

            PR rtl-optimization/70903
            * gcc.c-torture/execute/pr70903.c: New test.

  .../gcc/testsuite/gcc.c-torture/execute/pr70903.c:25:1: error: redefinition of 'foo'
  .../gcc/testsuite/gcc.c-torture/execute/pr70903.c:6:1: note: previous definition of 'foo' was here
  .../gcc/testsuite/gcc.c-torture/execute/pr70903.c:31:5: error: redefinition of 'main'
  .../gcc/testsuite/gcc.c-torture/execute/pr70903.c:12:5: note: previous definition of 'main' was here

I've fixed that up as so in revision 239142, I hope you agree the change
is obvious.

Thanks,
James

---
2016-08-04  James Greenhalgh  <james.greenhalgh@arm.com>

	* gcc.c-torture/execute/pr70903.c: Fix duplicate body.


diff --git a/gcc/testsuite/gcc.c-torture/execute/pr70903.c b/gcc/testsuite/gcc.c-torture/execute/pr70903.c
index 6ffd0aa..175ad1a 100644
--- a/gcc/testsuite/gcc.c-torture/execute/pr70903.c
+++ b/gcc/testsuite/gcc.c-torture/execute/pr70903.c
@@ -17,22 +17,4 @@ int main ()
     __builtin_abort();
   return 0;
 }
-typedef unsigned char V8 __attribute__ ((vector_size (32)));
-typedef unsigned int V32 __attribute__ ((vector_size (32)));
-typedef unsigned long long V64 __attribute__ ((vector_size (32)));
-
-static V32 __attribute__ ((noinline, noclone))
-foo (V64 x)
-{
-  V64 y = (V64)(V8){((V8)(V64){65535, x[0]})[1]};
-  return (V32){y[0], 255};
-}
 
-int main ()
-{
-  V32 x = foo ((V64){});
-//  __builtin_printf ("%08x %08x %08x %08x %08x %08x %08x %08x\n", x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7]);
-  if (x[1] != 255)
-    __builtin_abort();
-  return 0;
-}


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-08-04 16:51 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-26 17:32 [PATCH] Fix wrong code on aarch64 due to paradoxical subreg Bernd Edlinger
2016-07-27 21:31 ` Jeff Law
2016-07-28  3:50   ` Bernd Edlinger
2016-07-28 20:59   ` Bernd Edlinger
2016-07-29  7:02     ` Segher Boessenkool
2016-07-29 13:04       ` Bernd Edlinger
2016-07-30  8:18         ` Bernd Edlinger
2016-07-30 11:39           ` Segher Boessenkool
2016-07-31 10:44             ` Bernd Edlinger
2016-07-31 10:56               ` Bernd Edlinger
2016-08-01 17:54               ` Jeff Law
2016-08-01 18:53                 ` Bernd Edlinger
2016-08-01 23:31                   ` Segher Boessenkool
2016-08-02 15:21                     ` Jeff Law
2016-08-02 16:46                       ` Segher Boessenkool
2016-08-02 20:16                         ` Bernd Schmidt
2016-08-03 15:13                           ` Jeff Law
2016-08-02 17:46                       ` Richard Biener
2016-08-02 18:05                         ` Jeff Law
2016-08-03 14:25                   ` Bernd Edlinger
2016-08-03 15:38                   ` Jeff Law
2016-08-03 17:41                     ` Bernd Edlinger
2016-08-03 22:08                       ` Jeff Law
2016-08-04 16:51                         ` James Greenhalgh
2016-08-01 17:16           ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).