public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC, LRA] Incorrect subreg resolution?
@ 2014-01-10 19:21 Tejas Belagod
  2014-01-11 14:36 ` Richard Sandiford
  0 siblings, 1 reply; 6+ messages in thread
From: Tejas Belagod @ 2014-01-10 19:21 UTC (permalink / raw)
  To: vmakarov; +Cc: gcc

Hi,

When I relaxed CANNOT_CHANGE_MODE_CLASS to undefined for AArch64, 
gcc.c-torture/execute/copysign1.c generates incorrect code because LRA cannot 
seem to handle subregs like

  (subreg:DI (reg:TF hard_reg) 8)

on hard registers where the subreg byte offset is unaligned to a hard register 
boundary(16 for AArch64). It seems to quietly ignore the 8 and resolves this to 
incorrect an hard register during reload.

When I compile this test with -O3,

long double
cl (long double x, long double y)
{
   return __builtin_copysignl (x, y);
}

cs.c.213r.ira:

(insn 26 10 33 2 (set (reg:DI 87 [ y+8 ])
         (subreg:DI (reg:TF 33 v1 [ y ]) 8)) cs.c:4 34 {*movdi_aarch64}
      (expr_list:REG_DEAD (reg:TF 33 v1 [ y ])
         (nil)))
(insn 33 26 35 2 (set (reg:TF 93)
         (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64}
      (expr_list:REG_DEAD (reg:TF 32 v0 [ x ])
         (nil)))
(insn 35 33 34 2 (set (reg:DI 92 [ x+8 ])
         (subreg:DI (reg:TF 93) 8)) cs.c:4 34 {*movdi_aarch64}
      (nil))
(insn 34 35 23 2 (set (reg:DI 91 [ x ])
         (subreg:DI (reg:TF 93) 0)) cs.c:4 34 {*movdi_aarch64}
      (expr_list:REG_DEAD (reg:TF 93)
         (nil)))
....

cs.c.214r.reload

(insn 26 10 33 2 (set (reg:DI 2 x2 [orig:87 y+8 ] [87])
         (reg:DI 33 v1 [ y+8 ])) cs.c:4 34 {*movdi_aarch64}
      (nil))
(insn 33 26 35 2 (set (reg:TF 0 x0 [93])
         (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64}
      (nil))
(insn 35 33 34 2 (set (reg:DI 1 x1 [orig:92 x+8 ] [92])
         (reg:DI 1 x1 [+8 ])) cs.c:4 34 {*movdi_aarch64}
      (nil))
(insn 34 35 8 2 (set (reg:DI 0 x0 [orig:91 x ] [91])
         (reg:DI 0 x0 [93])) cs.c:4 34 {*movdi_aarch64}
      (nil))
.....

You can see the changes to insn 26 before and after reload - the SUBREG_BYTE 
offset of 8 seems to have been translated to v0 instead of v0.d[1] by 
get_hard_regno ().

What's interesting here is that the SUBREG_BYTE that is generated for

(subreg:DI (reg:TF 33 v1 [ y ]) 8)

isn't aligned to a hard register boundary on SIMD regs where UNITS_PER_VREG for 
AArch64 is 16. Therefore when this subreg is resolved, it resolves to v1 instead 
of v1.d[1]. Is this something going wrong in LRA or is this a more fundamental 
problem with generating subregs of hard regs with unaligned subreg byte offsets? 
The same subreg on a pseudo works OK because in insn 33, the TF mode is 
allocated integer registers and all is well.

Thanks,
Tejas Belagod
ARM.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC, LRA] Incorrect subreg resolution?
  2014-01-10 19:21 [RFC, LRA] Incorrect subreg resolution? Tejas Belagod
@ 2014-01-11 14:36 ` Richard Sandiford
  2014-01-11 15:18   ` H.J. Lu
  2014-03-11  7:57   ` Richard Sandiford
  0 siblings, 2 replies; 6+ messages in thread
From: Richard Sandiford @ 2014-01-11 14:36 UTC (permalink / raw)
  To: Tejas Belagod; +Cc: vmakarov, gcc

Tejas Belagod <tbelagod@arm.com> writes:
> When I relaxed CANNOT_CHANGE_MODE_CLASS to undefined for AArch64, 
> gcc.c-torture/execute/copysign1.c generates incorrect code because LRA cannot 
> seem to handle subregs like
>
>   (subreg:DI (reg:TF hard_reg) 8)
>
> on hard registers where the subreg byte offset is unaligned to a hard register 
> boundary(16 for AArch64). It seems to quietly ignore the 8 and resolves this to 
> incorrect an hard register during reload.
>
> When I compile this test with -O3,
>
> long double
> cl (long double x, long double y)
> {
>    return __builtin_copysignl (x, y);
> }
>
> cs.c.213r.ira:
>
> (insn 26 10 33 2 (set (reg:DI 87 [ y+8 ])
>          (subreg:DI (reg:TF 33 v1 [ y ]) 8)) cs.c:4 34 {*movdi_aarch64}
>       (expr_list:REG_DEAD (reg:TF 33 v1 [ y ])
>          (nil)))
> (insn 33 26 35 2 (set (reg:TF 93)
>          (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64}
>       (expr_list:REG_DEAD (reg:TF 32 v0 [ x ])
>          (nil)))
> (insn 35 33 34 2 (set (reg:DI 92 [ x+8 ])
>          (subreg:DI (reg:TF 93) 8)) cs.c:4 34 {*movdi_aarch64}
>       (nil))
> (insn 34 35 23 2 (set (reg:DI 91 [ x ])
>          (subreg:DI (reg:TF 93) 0)) cs.c:4 34 {*movdi_aarch64}
>       (expr_list:REG_DEAD (reg:TF 93)
>          (nil)))
> ....
>
> cs.c.214r.reload
>
> (insn 26 10 33 2 (set (reg:DI 2 x2 [orig:87 y+8 ] [87])
>          (reg:DI 33 v1 [ y+8 ])) cs.c:4 34 {*movdi_aarch64}
>       (nil))
> (insn 33 26 35 2 (set (reg:TF 0 x0 [93])
>          (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64}
>       (nil))
> (insn 35 33 34 2 (set (reg:DI 1 x1 [orig:92 x+8 ] [92])
>          (reg:DI 1 x1 [+8 ])) cs.c:4 34 {*movdi_aarch64}
>       (nil))
> (insn 34 35 8 2 (set (reg:DI 0 x0 [orig:91 x ] [91])
>          (reg:DI 0 x0 [93])) cs.c:4 34 {*movdi_aarch64}
>       (nil))
> .....
>
> You can see the changes to insn 26 before and after reload - the SUBREG_BYTE 
> offset of 8 seems to have been translated to v0 instead of v0.d[1] by 
> get_hard_regno ().
>
> What's interesting here is that the SUBREG_BYTE that is generated for
>
> (subreg:DI (reg:TF 33 v1 [ y ]) 8)
>
> isn't aligned to a hard register boundary on SIMD regs where UNITS_PER_VREG for 
> AArch64 is 16. Therefore when this subreg is resolved, it resolves to v1 instead 
> of v1.d[1]. Is this something going wrong in LRA or is this a more fundamental 
> problem with generating subregs of hard regs with unaligned subreg byte offsets? 
> The same subreg on a pseudo works OK because in insn 33, the TF mode is 
> allocated integer registers and all is well.

I think this is the same problem that was being discussed for x86
after your no-op vec-select patch:

   http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00801.html

and long following thread.

I'd still like to solve this in a target-independent way rather than add
an offset to CANNOT_CHANGE_MODE_CLASS, but I haven't had time to look at
it...

Thanks,
Richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC, LRA] Incorrect subreg resolution?
  2014-01-11 14:36 ` Richard Sandiford
@ 2014-01-11 15:18   ` H.J. Lu
  2014-01-12  9:05     ` Richard Sandiford
  2014-03-11  7:57   ` Richard Sandiford
  1 sibling, 1 reply; 6+ messages in thread
From: H.J. Lu @ 2014-01-11 15:18 UTC (permalink / raw)
  To: Tejas Belagod, vmakarov, gcc, Richard Sandiford

On Sat, Jan 11, 2014 at 2:12 AM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> Tejas Belagod <tbelagod@arm.com> writes:
>> When I relaxed CANNOT_CHANGE_MODE_CLASS to undefined for AArch64,
>> gcc.c-torture/execute/copysign1.c generates incorrect code because LRA cannot
>> seem to handle subregs like
>>
>>   (subreg:DI (reg:TF hard_reg) 8)
>>
>> on hard registers where the subreg byte offset is unaligned to a hard register
>> boundary(16 for AArch64). It seems to quietly ignore the 8 and resolves this to
>> incorrect an hard register during reload.
>>
>> When I compile this test with -O3,
>>
>> long double
>> cl (long double x, long double y)
>> {
>>    return __builtin_copysignl (x, y);
>> }
>>
>> cs.c.213r.ira:
>>
>> (insn 26 10 33 2 (set (reg:DI 87 [ y+8 ])
>>          (subreg:DI (reg:TF 33 v1 [ y ]) 8)) cs.c:4 34 {*movdi_aarch64}
>>       (expr_list:REG_DEAD (reg:TF 33 v1 [ y ])
>>          (nil)))
>> (insn 33 26 35 2 (set (reg:TF 93)
>>          (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64}
>>       (expr_list:REG_DEAD (reg:TF 32 v0 [ x ])
>>          (nil)))
>> (insn 35 33 34 2 (set (reg:DI 92 [ x+8 ])
>>          (subreg:DI (reg:TF 93) 8)) cs.c:4 34 {*movdi_aarch64}
>>       (nil))
>> (insn 34 35 23 2 (set (reg:DI 91 [ x ])
>>          (subreg:DI (reg:TF 93) 0)) cs.c:4 34 {*movdi_aarch64}
>>       (expr_list:REG_DEAD (reg:TF 93)
>>          (nil)))
>> ....
>>
>> cs.c.214r.reload
>>
>> (insn 26 10 33 2 (set (reg:DI 2 x2 [orig:87 y+8 ] [87])
>>          (reg:DI 33 v1 [ y+8 ])) cs.c:4 34 {*movdi_aarch64}
>>       (nil))
>> (insn 33 26 35 2 (set (reg:TF 0 x0 [93])
>>          (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64}
>>       (nil))
>> (insn 35 33 34 2 (set (reg:DI 1 x1 [orig:92 x+8 ] [92])
>>          (reg:DI 1 x1 [+8 ])) cs.c:4 34 {*movdi_aarch64}
>>       (nil))
>> (insn 34 35 8 2 (set (reg:DI 0 x0 [orig:91 x ] [91])
>>          (reg:DI 0 x0 [93])) cs.c:4 34 {*movdi_aarch64}
>>       (nil))
>> .....
>>
>> You can see the changes to insn 26 before and after reload - the SUBREG_BYTE
>> offset of 8 seems to have been translated to v0 instead of v0.d[1] by
>> get_hard_regno ().
>>
>> What's interesting here is that the SUBREG_BYTE that is generated for
>>
>> (subreg:DI (reg:TF 33 v1 [ y ]) 8)
>>
>> isn't aligned to a hard register boundary on SIMD regs where UNITS_PER_VREG for
>> AArch64 is 16. Therefore when this subreg is resolved, it resolves to v1 instead
>> of v1.d[1]. Is this something going wrong in LRA or is this a more fundamental
>> problem with generating subregs of hard regs with unaligned subreg byte offsets?
>> The same subreg on a pseudo works OK because in insn 33, the TF mode is
>> allocated integer registers and all is well.
>
> I think this is the same problem that was being discussed for x86
> after your no-op vec-select patch:
>
>    http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00801.html
>
> and long following thread.
>
> I'd still like to solve this in a target-independent way rather than add
> an offset to CANNOT_CHANGE_MODE_CLASS, but I haven't had time to look at
> it...

How about this patch

http://gcc.gnu.org/git/?p=gcc.git;a=patch;h=23023006b946e06b6fd93786585f2f8cd4837956

I tested it on Linux/x86-64 without any regressions.

-- 
H.J.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC, LRA] Incorrect subreg resolution?
  2014-01-11 15:18   ` H.J. Lu
@ 2014-01-12  9:05     ` Richard Sandiford
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Sandiford @ 2014-01-12  9:05 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Tejas Belagod, vmakarov, gcc

"H.J. Lu" <hjl.tools@gmail.com> writes:
> How about this patch
>
> http://gcc.gnu.org/git/?p=gcc.git;a=patch;h=23023006b946e06b6fd93786585f2f8cd4837956
>
> I tested it on Linux/x86-64 without any regressions.

I don't think we should key this off vectorness.  It's a question of
whether the class contains a register for which the given subreg parameters
are valid.  E.g. on some targets the FP registers are wider than the general
registers.  And on some targets (e.g. NEON) you can use pairs of vector
registers, in which case there are two valid offsets for each "to" mode.

Target-independent code could figure this out from the information
it already has.  I don't think new hooks should be needed.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC, LRA] Incorrect subreg resolution?
  2014-01-11 14:36 ` Richard Sandiford
  2014-01-11 15:18   ` H.J. Lu
@ 2014-03-11  7:57   ` Richard Sandiford
  2014-04-22 13:54     ` Tejas Belagod
  1 sibling, 1 reply; 6+ messages in thread
From: Richard Sandiford @ 2014-03-11  7:57 UTC (permalink / raw)
  To: Tejas Belagod; +Cc: vmakarov, gcc, hongjiu.lu, ramana.radhakrishnan

Returning to this old thread...

Richard Sandiford <rdsandiford@googlemail.com> writes:
> Tejas Belagod <tbelagod@arm.com> writes:
>> When I relaxed CANNOT_CHANGE_MODE_CLASS to undefined for AArch64, 
>> gcc.c-torture/execute/copysign1.c generates incorrect code because LRA cannot 
>> seem to handle subregs like
>>
>>   (subreg:DI (reg:TF hard_reg) 8)
>>
>> on hard registers where the subreg byte offset is unaligned to a hard register 
>> boundary(16 for AArch64). It seems to quietly ignore the 8 and resolves this to 
>> incorrect an hard register during reload.
>>
>> When I compile this test with -O3,
>>
>> long double
>> cl (long double x, long double y)
>> {
>>    return __builtin_copysignl (x, y);
>> }
>>
>> cs.c.213r.ira:
>>
>> (insn 26 10 33 2 (set (reg:DI 87 [ y+8 ])
>>          (subreg:DI (reg:TF 33 v1 [ y ]) 8)) cs.c:4 34 {*movdi_aarch64}
>>       (expr_list:REG_DEAD (reg:TF 33 v1 [ y ])
>>          (nil)))
>> (insn 33 26 35 2 (set (reg:TF 93)
>>          (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64}
>>       (expr_list:REG_DEAD (reg:TF 32 v0 [ x ])
>>          (nil)))
>> (insn 35 33 34 2 (set (reg:DI 92 [ x+8 ])
>>          (subreg:DI (reg:TF 93) 8)) cs.c:4 34 {*movdi_aarch64}
>>       (nil))
>> (insn 34 35 23 2 (set (reg:DI 91 [ x ])
>>          (subreg:DI (reg:TF 93) 0)) cs.c:4 34 {*movdi_aarch64}
>>       (expr_list:REG_DEAD (reg:TF 93)
>>          (nil)))
>> ....
>>
>> cs.c.214r.reload
>>
>> (insn 26 10 33 2 (set (reg:DI 2 x2 [orig:87 y+8 ] [87])
>>          (reg:DI 33 v1 [ y+8 ])) cs.c:4 34 {*movdi_aarch64}
>>       (nil))
>> (insn 33 26 35 2 (set (reg:TF 0 x0 [93])
>>          (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64}
>>       (nil))
>> (insn 35 33 34 2 (set (reg:DI 1 x1 [orig:92 x+8 ] [92])
>>          (reg:DI 1 x1 [+8 ])) cs.c:4 34 {*movdi_aarch64}
>>       (nil))
>> (insn 34 35 8 2 (set (reg:DI 0 x0 [orig:91 x ] [91])
>>          (reg:DI 0 x0 [93])) cs.c:4 34 {*movdi_aarch64}
>>       (nil))
>> .....
>>
>> You can see the changes to insn 26 before and after reload - the SUBREG_BYTE 
>> offset of 8 seems to have been translated to v0 instead of v0.d[1] by 
>> get_hard_regno ().
>>
>> What's interesting here is that the SUBREG_BYTE that is generated for
>>
>> (subreg:DI (reg:TF 33 v1 [ y ]) 8)
>>
>> isn't aligned to a hard register boundary on SIMD regs where UNITS_PER_VREG for 
>> AArch64 is 16. Therefore when this subreg is resolved, it resolves to v1 instead 
>> of v1.d[1]. Is this something going wrong in LRA or is this a more fundamental 
>> problem with generating subregs of hard regs with unaligned subreg byte offsets? 
>> The same subreg on a pseudo works OK because in insn 33, the TF mode is 
>> allocated integer registers and all is well.
>
> I think this is the same problem that was being discussed for x86
> after your no-op vec-select patch:
>
>    http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00801.html
>
> and long following thread.
>
> I'd still like to solve this in a target-independent way rather than add
> an offset to CANNOT_CHANGE_MODE_CLASS, but I haven't had time to look at
> it...

FWIW, here's one possible approach.  The main part is to make the
invalid_mode_change code calculate a set of registers that are either
(a) invalid for the pseudo mode to begin with or (b) do not allow one
of the subregs to be taken (as calculated by simplify_subreg_regno,
which includes the original CANNOT_CHANGE_MODE_CLASS check).

One concern might be about compilation speed when collecting this info.
OTOH, the query is now genuinely constant time, whereas the old bitmap
test was O(num-pseudos) in the worst case.  It might also be possible
to speed things up by walking the subregs using the DF information,
if it's up-to-date at this point (haven't checked).  It would also be
possible to give an ID to each (inner mode, outer mode, byte) combination
and lazily cache the invalid register set for each one.

I went through the other uses of CANNOT_CHANGE_MODE_CLASS.  Most of them
were checking for lowpart mode changes so look safe.  The exception was
combine.c:subst.

This is really four patches squashed into one, but it's not ready to be
submitted yet.  Was just wondering whether this solved your problem.

Thanks,
Richard



*** /tmp/OCSP7f_combine.c	2014-03-11 07:34:37.928138693 +0000
--- gcc/combine.c	2014-03-10 21:39:09.428718086 +0000
*************** subst (rtx x, rtx from, rtx to, int in_d
*** 5082,5096 ****
  		      )
  		    return gen_rtx_CLOBBER (VOIDmode, const0_rtx);
  
- #ifdef CANNOT_CHANGE_MODE_CLASS
  		  if (code == SUBREG
  		      && REG_P (to)
  		      && REGNO (to) < FIRST_PSEUDO_REGISTER
! 		      && REG_CANNOT_CHANGE_MODE_P (REGNO (to),
! 						   GET_MODE (to),
! 						   GET_MODE (x)))
  		    return gen_rtx_CLOBBER (VOIDmode, const0_rtx);
- #endif
  
  		  new_rtx = (unique_copy && n_occurrences ? copy_rtx (to) : to);
  		  n_occurrences++;
--- 5082,5094 ----
  		      )
  		    return gen_rtx_CLOBBER (VOIDmode, const0_rtx);
  
  		  if (code == SUBREG
  		      && REG_P (to)
  		      && REGNO (to) < FIRST_PSEUDO_REGISTER
! 		      && simplify_subreg_regno (REGNO (to), GET_MODE (to),
! 						SUBREG_BYTE (x),
! 						GET_MODE (x)) < 0)
  		    return gen_rtx_CLOBBER (VOIDmode, const0_rtx);
  
  		  new_rtx = (unique_copy && n_occurrences ? copy_rtx (to) : to);
  		  n_occurrences++;
*** /tmp/Cs4Cnh_i386.c	2014-03-11 07:34:37.953154637 +0000
--- gcc/config/i386/i386.c	2014-03-10 21:39:11.167734347 +0000
*************** ix86_cannot_change_mode_class (enum mach
*** 37439,37451 ****
  	 the vec_dupv4hi pattern.  */
        if (GET_MODE_SIZE (from) < 4)
  	return true;
- 
-       /* Vector registers do not support subreg with nonzero offsets, which
- 	 are otherwise valid for integer registers.  Since we can't see
- 	 whether we have a nonzero offset from here, prohibit all
-          nonparadoxical subregs changing size.  */
-       if (GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
- 	return true;
      }
  
    return false;
--- 37439,37444 ----
*** /tmp/uUmuMg_ira-costs.c	2014-03-11 07:34:37.978170581 +0000
--- gcc/ira-costs.c	2014-03-10 21:39:10.292726165 +0000
*************** print_allocno_costs (FILE *f)
*** 1416,1425 ****
  	{
  	  rclass = cost_classes[k];
  	  if (contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (regno)]
! #ifdef CANNOT_CHANGE_MODE_CLASS
! 	      && ! invalid_mode_change_p (regno, (enum reg_class) rclass)
! #endif
! 	      )
  	    {
  	      fprintf (f, " %s:%d", reg_class_names[rclass],
  		       COSTS (costs, i)->cost[k]);
--- 1416,1422 ----
  	{
  	  rclass = cost_classes[k];
  	  if (contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (regno)]
! 	      && ! invalid_mode_change_p (regno, (enum reg_class) rclass))
  	    {
  	      fprintf (f, " %s:%d", reg_class_names[rclass],
  		       COSTS (costs, i)->cost[k]);
*************** print_pseudo_costs (FILE *f)
*** 1458,1467 ****
  	{
  	  rclass = cost_classes[k];
  	  if (contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (regno)]
! #ifdef CANNOT_CHANGE_MODE_CLASS
! 	      && ! invalid_mode_change_p (regno, (enum reg_class) rclass)
! #endif
! 	      )
  	    fprintf (f, " %s:%d", reg_class_names[rclass],
  		     COSTS (costs, regno)->cost[k]);
  	}
--- 1455,1461 ----
  	{
  	  rclass = cost_classes[k];
  	  if (contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (regno)]
! 	      && ! invalid_mode_change_p (regno, (enum reg_class) rclass))
  	    fprintf (f, " %s:%d", reg_class_names[rclass],
  		     COSTS (costs, regno)->cost[k]);
  	}
*************** find_costs_and_classes (FILE *dump_file)
*** 1703,1712 ****
  	      /* Ignore classes that are too small or invalid for this
  		 operand.  */
  	      if (! contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (i)]
! #ifdef CANNOT_CHANGE_MODE_CLASS
! 		  || invalid_mode_change_p (i, (enum reg_class) rclass)
! #endif
! 		  )
  		continue;
  	      if (i_costs[k] < best_cost)
  		{
--- 1697,1703 ----
  	      /* Ignore classes that are too small or invalid for this
  		 operand.  */
  	      if (! contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (i)]
! 		  || invalid_mode_change_p (i, (enum reg_class) rclass))
  		continue;
  	      if (i_costs[k] < best_cost)
  		{
*************** find_costs_and_classes (FILE *dump_file)
*** 1786,1795 ****
  		      /* Ignore classes that are too small or invalid
  			 for this operand.  */
  		      if (! contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (i)]
! #ifdef CANNOT_CHANGE_MODE_CLASS
! 			  || invalid_mode_change_p (i, (enum reg_class) rclass)
! #endif
! 			  )
  			;
  		      else if (total_a_costs[k] < best_cost)
  			{
--- 1777,1783 ----
  		      /* Ignore classes that are too small or invalid
  			 for this operand.  */
  		      if (! contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (i)]
! 			  || invalid_mode_change_p (i, (enum reg_class) rclass))
  			;
  		      else if (total_a_costs[k] < best_cost)
  			{
*** /tmp/MB8Pfk_reginfo.c	2014-03-11 07:34:37.995181423 +0000
--- gcc/reginfo.c	2014-03-10 21:39:10.292726165 +0000
*************** reg_classes_intersect_p (reg_class_t c1,
*** 1191,1303 ****
  }
  
  \f
  
! /* Passes for keeping and updating info about modes of registers
!    inside subregisters.  */
  
! #ifdef CANNOT_CHANGE_MODE_CLASS
  
! static bitmap invalid_mode_changes;
  
! static void
! record_subregs_of_mode (rtx subreg, bitmap subregs_of_mode)
  {
!   enum machine_mode mode;
!   unsigned int regno;
  
    if (!REG_P (SUBREG_REG (subreg)))
!     return;
! 
!   regno = REGNO (SUBREG_REG (subreg));
!   mode = GET_MODE (subreg);
  
    if (regno < FIRST_PSEUDO_REGISTER)
!     return;
  
!   if (bitmap_set_bit (subregs_of_mode,
! 		      regno * NUM_MACHINE_MODES + (unsigned int) mode))
      {
!       unsigned int rclass;
!       for (rclass = 0; rclass < N_REG_CLASSES; rclass++)
! 	if (!bitmap_bit_p (invalid_mode_changes,
! 			   regno * N_REG_CLASSES + rclass)
! 	    && CANNOT_CHANGE_MODE_CLASS (PSEUDO_REGNO_MODE (regno),
! 					 mode, (enum reg_class) rclass))
! 	  bitmap_set_bit (invalid_mode_changes,
! 			  regno * N_REG_CLASSES + rclass);
      }
- }
  
! /* Call record_subregs_of_mode for all the subregs in X.  */
! static void
! find_subregs_of_mode (rtx x, bitmap subregs_of_mode)
! {
!   enum rtx_code code = GET_CODE (x);
!   const char * const fmt = GET_RTX_FORMAT (code);
!   int i;
  
!   if (code == SUBREG)
!     record_subregs_of_mode (x, subregs_of_mode);
  
!   /* Time for some deep diving.  */
!   for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
!     {
!       if (fmt[i] == 'e')
! 	find_subregs_of_mode (XEXP (x, i), subregs_of_mode);
!       else if (fmt[i] == 'E')
! 	{
! 	  int j;
! 	  for (j = XVECLEN (x, i) - 1; j >= 0; j--)
! 	    find_subregs_of_mode (XVECEXP (x, i, j), subregs_of_mode);
! 	}
!     }
  }
  
  void
  init_subregs_of_mode (void)
  {
    basic_block bb;
    rtx insn;
-   bitmap_obstack srom_obstack;
-   bitmap subregs_of_mode;
  
!   gcc_assert (invalid_mode_changes == NULL);
!   invalid_mode_changes = BITMAP_ALLOC (NULL);
!   bitmap_obstack_initialize (&srom_obstack);
!   subregs_of_mode = BITMAP_ALLOC (&srom_obstack);
  
    FOR_EACH_BB_FN (bb, cfun)
      FOR_BB_INSNS (bb, insn)
        if (NONDEBUG_INSN_P (insn))
!         find_subregs_of_mode (PATTERN (insn), subregs_of_mode);
! 
!   BITMAP_FREE (subregs_of_mode);
!   bitmap_obstack_release (&srom_obstack);
  }
  
! /* Return 1 if REGNO has had an invalid mode change in CLASS from FROM
!    mode.  */
  bool
  invalid_mode_change_p (unsigned int regno,
  		       enum reg_class rclass)
  {
!   return bitmap_bit_p (invalid_mode_changes,
! 		       regno * N_REG_CLASSES + (unsigned) rclass);
  }
  
  void
  finish_subregs_of_mode (void)
  {
!   BITMAP_FREE (invalid_mode_changes);
! }
! #else
! void
! init_subregs_of_mode (void)
! {
! }
! void
! finish_subregs_of_mode (void)
! {
  }
- 
- #endif /* CANNOT_CHANGE_MODE_CLASS */
--- 1191,1337 ----
  }
  
  \f
+ /* Information about one subreg of a pseudo register.  */
+ struct pseudo_subreg {
+   /* The next subreg seen for this pseudo register.  */
+   struct pseudo_subreg *next;
  
!   /* The mode of the subreg.  */
!   enum machine_mode mode;
  
!   /* The byte offset of the subreg from the start of the pseudo register.  */
!   unsigned int offset;
! };
  
! /* Information about all subregs of a pseudo register.  */
! struct pseudo_subreg_list {
!   /* The pseudo register that this structure describes.  */
!   unsigned int regno;
  
!   /* A list of the individual subregs seen.  */
!   struct pseudo_subreg *subregs;
! 
!   /* The set of all hard registers that are invalid because of subregs
!      listed in SUBREGS.  */
!   HARD_REG_SET invalid_regs;
! };
! 
! /* Maps pseudo register numbers to the associated entry of pseudo_subreg_lists.
!    Entries are initially undefined rather than zeroed; entry X is valid if:
! 
!       pseudo_subreg_index[X] < pseudo_subreg_lists.length ()
!       && pseudo_subreg_lists[pseudo_subreg_index[X]]->regno == X.  */
! static unsigned int *pseudo_subreg_index;
! 
! /* A list of all pseudo registers that have been seen in a subreg.  */
! static vec<struct pseudo_subreg_list *> pseudo_subreg_lists;
! 
! /* An obstack used to allocate the above.  */
! static struct obstack pseudo_subreg_obstack;
! 
! /* A for_each_rtx callback.  Record all subregs of pseudo registers.  */
! 
! static int
! record_subregs_of_mode (rtx *loc, void *)
  {
!   rtx subreg = *loc;
!   if (GET_CODE (subreg) != SUBREG)
!     return 0;
  
    if (!REG_P (SUBREG_REG (subreg)))
!     return -1;
  
+   unsigned int regno = REGNO (SUBREG_REG (subreg));
    if (regno < FIRST_PSEUDO_REGISTER)
!     return -1;
  
!   /* If we haven't seem this pseudo before, create an entry for it.  */
!   unsigned int index = pseudo_subreg_index[regno];
!   if (index >= pseudo_subreg_lists.length ()
!       || pseudo_subreg_lists[index]->regno != regno)
      {
!       struct pseudo_subreg_list *list = XOBNEW (&pseudo_subreg_obstack,
! 						struct pseudo_subreg_list);
!       list->subregs = 0;
!       CLEAR_HARD_REG_SET (list->invalid_regs);
!       list->regno = regno;
! 
!       index = pseudo_subreg_lists.length ();
!       pseudo_subreg_lists.safe_push (list);
!       pseudo_subreg_index[regno] = index;
      }
  
!   /* Quick return if we've already seen a subreg with the same mode
!      and offset.  In general the number of distinct subregs of a particular
!      pseudo register should be small.  */
!   struct pseudo_subreg_list *list = pseudo_subreg_lists[index];
!   enum machine_mode outer_mode = GET_MODE (subreg);
!   unsigned int offset = SUBREG_BYTE (subreg);
!   for (struct pseudo_subreg *ps = list->subregs; ps; ps = ps->next)
!     if (ps->mode == outer_mode && ps->offset == offset)
!       return -1;
  
!   /* Record this mode and offset.  */
!   struct pseudo_subreg *ps = XOBNEW (&pseudo_subreg_obstack,
! 				     struct pseudo_subreg);
!   ps->mode = outer_mode;
!   ps->offset = offset;
!   ps->next = list->subregs;
!   list->subregs = ps;
  
!   /* Record a hard register as invalid if it wouldn't be possible for
!      it to form the subreg.  */
!   enum machine_mode inner_mode = GET_MODE (SUBREG_REG (subreg));
!   for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
!     if (!TEST_HARD_REG_BIT (list->invalid_regs, i)
! 	&& (!HARD_REGNO_MODE_OK (i, inner_mode)
! 	    || simplify_subreg_regno (i, inner_mode, offset, outer_mode) < 0))
!       SET_HARD_REG_BIT (list->invalid_regs, i);
! 
!   return -1;
  }
  
+ /* Scan the function and record any information needed by
+    invalid_mode_change_p.  */
+ 
  void
  init_subregs_of_mode (void)
  {
    basic_block bb;
    rtx insn;
  
!   gcc_assert (!pseudo_subreg_index);
!   unsigned int num_regs = max_reg_num ();
!   pseudo_subreg_index = XNEWVEC (unsigned int, num_regs);
! 
!   gcc_obstack_init (&pseudo_subreg_obstack);
  
    FOR_EACH_BB_FN (bb, cfun)
      FOR_BB_INSNS (bb, insn)
        if (NONDEBUG_INSN_P (insn))
! 	for_each_rtx (&PATTERN (insn), record_subregs_of_mode, 0);
  }
  
! /* Return true if REGNO is involved in a subreg that prevents it from using
!    class RCLASS.  */
! 
  bool
  invalid_mode_change_p (unsigned int regno,
  		       enum reg_class rclass)
  {
!   unsigned int index = pseudo_subreg_index[regno];
!   return (index < pseudo_subreg_lists.length ()
! 	  && pseudo_subreg_lists[index]->regno == regno
! 	  && hard_reg_set_subset_p (reg_class_contents[rclass],
! 				    pseudo_subreg_lists[index]->invalid_regs));
  }
  
+ /* Release the structures created by init_subregs_of_mode.  */
+ 
  void
  finish_subregs_of_mode (void)
  {
!   free (pseudo_subreg_index);
!   pseudo_subreg_index = 0;
!   pseudo_subreg_lists.release ();
  }
*** /tmp/3iptsc_rtl.h	2014-03-11 07:34:38.011189919 +0000
--- gcc/rtl.h	2014-03-09 10:32:38.412143704 +0000
*************** struct subreg_info
*** 2128,2137 ****
  {
    /* Offset of first hard register involved in the subreg.  */
    int offset;
!   /* Number of hard registers involved in the subreg.  */
    int nregs;
    /* Whether this subreg can be represented as a hard reg with the new
!      mode.  */
    bool representable_p;
  };
  
--- 2128,2140 ----
  {
    /* Offset of first hard register involved in the subreg.  */
    int offset;
!   /* Number of hard registers involved in the subreg.  In the case of
!      a paradoxical subreg, this is the number of registers that would
!      be modified by writing to the subreg; some of them may be don't-care
!      when reading from the subreg.  */
    int nregs;
    /* Whether this subreg can be represented as a hard reg with the new
!      mode (by adding OFFSET to the original hard register).  */
    bool representable_p;
  };
  
*** /tmp/jHXMdb_rtlanal.c	2014-03-11 07:34:38.026197136 +0000
--- gcc/rtlanal.c	2014-03-09 10:33:39.645693024 +0000
*************** subreg_lsb (const_rtx x)
*** 3321,3327 ****
     xmode  - The mode of xregno.
     offset - The byte offset.
     ymode  - The mode of a top level SUBREG (or what may become one).
!    info   - Pointer to structure to fill in.  */
  void
  subreg_get_info (unsigned int xregno, enum machine_mode xmode,
  		 unsigned int offset, enum machine_mode ymode,
--- 3321,3340 ----
     xmode  - The mode of xregno.
     offset - The byte offset.
     ymode  - The mode of a top level SUBREG (or what may become one).
!    info   - Pointer to structure to fill in.
! 
!    Rather than considering one particular inner register (and thus one
!    particular "outer" register) in isolation, this function really uses
!    XREGNO as a model for a sequence of isomorphic hard registers.  Thus the
!    function does not check whether adding INFO->offset to XREGNO gives
!    a valid hard register; even if INFO->offset + XREGNO is out of range,
!    there might be another register of the same type that is in range.
!    Likewise it doesn't check whether HARD_REGNO_MODE_OK accepts the new
!    register, since that can depend on things like whether the final
!    register number is even or odd.  Callers that want to check whether
!    this particular subreg can be replaced by a simple (reg ...) should
!    use simplify_subreg_regno.  */
! 
  void
  subreg_get_info (unsigned int xregno, enum machine_mode xmode,
  		 unsigned int offset, enum machine_mode ymode,

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC, LRA] Incorrect subreg resolution?
  2014-03-11  7:57   ` Richard Sandiford
@ 2014-04-22 13:54     ` Tejas Belagod
  0 siblings, 0 replies; 6+ messages in thread
From: Tejas Belagod @ 2014-04-22 13:54 UTC (permalink / raw)
  To: vmakarov, gcc, hongjiu.lu, ramana.radhakrishnan, rdsandiford

Richard Sandiford wrote:
> Returning to this old thread...
> 
> Richard Sandiford <rdsandiford@googlemail.com> writes:
>> Tejas Belagod <tbelagod@arm.com> writes:
>>> When I relaxed CANNOT_CHANGE_MODE_CLASS to undefined for AArch64,
>>> gcc.c-torture/execute/copysign1.c generates incorrect code because LRA cannot
>>> seem to handle subregs like
>>>
>>>   (subreg:DI (reg:TF hard_reg) 8)
>>>
>>> on hard registers where the subreg byte offset is unaligned to a hard register
>>> boundary(16 for AArch64). It seems to quietly ignore the 8 and resolves this to
>>> incorrect an hard register during reload.
>>>
>>> When I compile this test with -O3,
>>>
>>> long double
>>> cl (long double x, long double y)
>>> {
>>>    return __builtin_copysignl (x, y);
>>> }
>>>
>>> cs.c.213r.ira:
>>>
>>> (insn 26 10 33 2 (set (reg:DI 87 [ y+8 ])
>>>          (subreg:DI (reg:TF 33 v1 [ y ]) 8)) cs.c:4 34 {*movdi_aarch64}
>>>       (expr_list:REG_DEAD (reg:TF 33 v1 [ y ])
>>>          (nil)))
>>> (insn 33 26 35 2 (set (reg:TF 93)
>>>          (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64}
>>>       (expr_list:REG_DEAD (reg:TF 32 v0 [ x ])
>>>          (nil)))
>>> (insn 35 33 34 2 (set (reg:DI 92 [ x+8 ])
>>>          (subreg:DI (reg:TF 93) 8)) cs.c:4 34 {*movdi_aarch64}
>>>       (nil))
>>> (insn 34 35 23 2 (set (reg:DI 91 [ x ])
>>>          (subreg:DI (reg:TF 93) 0)) cs.c:4 34 {*movdi_aarch64}
>>>       (expr_list:REG_DEAD (reg:TF 93)
>>>          (nil)))
>>> ....
>>>
>>> cs.c.214r.reload
>>>
>>> (insn 26 10 33 2 (set (reg:DI 2 x2 [orig:87 y+8 ] [87])
>>>          (reg:DI 33 v1 [ y+8 ])) cs.c:4 34 {*movdi_aarch64}
>>>       (nil))
>>> (insn 33 26 35 2 (set (reg:TF 0 x0 [93])
>>>          (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64}
>>>       (nil))
>>> (insn 35 33 34 2 (set (reg:DI 1 x1 [orig:92 x+8 ] [92])
>>>          (reg:DI 1 x1 [+8 ])) cs.c:4 34 {*movdi_aarch64}
>>>       (nil))
>>> (insn 34 35 8 2 (set (reg:DI 0 x0 [orig:91 x ] [91])
>>>          (reg:DI 0 x0 [93])) cs.c:4 34 {*movdi_aarch64}
>>>       (nil))
>>> .....
>>>
>>> You can see the changes to insn 26 before and after reload - the SUBREG_BYTE
>>> offset of 8 seems to have been translated to v0 instead of v0.d[1] by
>>> get_hard_regno ().
>>>
>>> What's interesting here is that the SUBREG_BYTE that is generated for
>>>
>>> (subreg:DI (reg:TF 33 v1 [ y ]) 8)
>>>
>>> isn't aligned to a hard register boundary on SIMD regs where UNITS_PER_VREG for
>>> AArch64 is 16. Therefore when this subreg is resolved, it resolves to v1 instead
>>> of v1.d[1]. Is this something going wrong in LRA or is this a more fundamental
>>> problem with generating subregs of hard regs with unaligned subreg byte offsets?
>>> The same subreg on a pseudo works OK because in insn 33, the TF mode is
>>> allocated integer registers and all is well.
>> I think this is the same problem that was being discussed for x86
>> after your no-op vec-select patch:
>>
>>    http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00801.html
>>
>> and long following thread.
>>
>> I'd still like to solve this in a target-independent way rather than add
>> an offset to CANNOT_CHANGE_MODE_CLASS, but I haven't had time to look at
>> it...
> 
> FWIW, here's one possible approach.  The main part is to make the
> invalid_mode_change code calculate a set of registers that are either
> (a) invalid for the pseudo mode to begin with or (b) do not allow one
> of the subregs to be taken (as calculated by simplify_subreg_regno,
> which includes the original CANNOT_CHANGE_MODE_CLASS check).
> 
> One concern might be about compilation speed when collecting this info.
> OTOH, the query is now genuinely constant time, whereas the old bitmap
> test was O(num-pseudos) in the worst case.  It might also be possible
> to speed things up by walking the subregs using the DF information,
> if it's up-to-date at this point (haven't checked).  It would also be
> possible to give an ID to each (inner mode, outer mode, byte) combination
> and lazily cache the invalid register set for each one.
> 
> I went through the other uses of CANNOT_CHANGE_MODE_CLASS.  Most of them
> were checking for lowpart mode changes so look safe.  The exception was
> combine.c:subst.
> 
> This is really four patches squashed into one, but it's not ready to be
> submitted yet.  Was just wondering whether this solved your problem.
> 

Hi Richard,

Sorry for the delay in replying to this.

Thanks for this patch - it bootstraps and regresses fine for aarch64. It also 
regresses OK on ARM.
Your patch also fixes issues I was seeing when I undefined C_C_M_C for aarch64 
which is what I was mostly troubled by (copysign1 regression et. al.)

Many Thanks,
Tejas.

> Thanks,
> Richard
> 
> 
> 
> *** /tmp/OCSP7f_combine.c       2014-03-11 07:34:37.928138693 +0000
> --- gcc/combine.c       2014-03-10 21:39:09.428718086 +0000
> *************** subst (rtx x, rtx from, rtx to, int in_d
> *** 5082,5096 ****
>                       )
>                     return gen_rtx_CLOBBER (VOIDmode, const0_rtx);
> 
> - #ifdef CANNOT_CHANGE_MODE_CLASS
>                   if (code == SUBREG
>                       && REG_P (to)
>                       && REGNO (to) < FIRST_PSEUDO_REGISTER
> !                     && REG_CANNOT_CHANGE_MODE_P (REGNO (to),
> !                                                  GET_MODE (to),
> !                                                  GET_MODE (x)))
>                     return gen_rtx_CLOBBER (VOIDmode, const0_rtx);
> - #endif
> 
>                   new_rtx = (unique_copy && n_occurrences ? copy_rtx (to) : to);
>                   n_occurrences++;
> --- 5082,5094 ----
>                       )
>                     return gen_rtx_CLOBBER (VOIDmode, const0_rtx);
> 
>                   if (code == SUBREG
>                       && REG_P (to)
>                       && REGNO (to) < FIRST_PSEUDO_REGISTER
> !                     && simplify_subreg_regno (REGNO (to), GET_MODE (to),
> !                                               SUBREG_BYTE (x),
> !                                               GET_MODE (x)) < 0)
>                     return gen_rtx_CLOBBER (VOIDmode, const0_rtx);
> 
>                   new_rtx = (unique_copy && n_occurrences ? copy_rtx (to) : to);
>                   n_occurrences++;
> *** /tmp/Cs4Cnh_i386.c  2014-03-11 07:34:37.953154637 +0000
> --- gcc/config/i386/i386.c      2014-03-10 21:39:11.167734347 +0000
> *************** ix86_cannot_change_mode_class (enum mach
> *** 37439,37451 ****
>          the vec_dupv4hi pattern.  */
>         if (GET_MODE_SIZE (from) < 4)
>         return true;
> -
> -       /* Vector registers do not support subreg with nonzero offsets, which
> -        are otherwise valid for integer registers.  Since we can't see
> -        whether we have a nonzero offset from here, prohibit all
> -          nonparadoxical subregs changing size.  */
> -       if (GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
> -       return true;
>       }
> 
>     return false;
> --- 37439,37444 ----
> *** /tmp/uUmuMg_ira-costs.c     2014-03-11 07:34:37.978170581 +0000
> --- gcc/ira-costs.c     2014-03-10 21:39:10.292726165 +0000
> *************** print_allocno_costs (FILE *f)
> *** 1416,1425 ****
>         {
>           rclass = cost_classes[k];
>           if (contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (regno)]
> ! #ifdef CANNOT_CHANGE_MODE_CLASS
> !             && ! invalid_mode_change_p (regno, (enum reg_class) rclass)
> ! #endif
> !             )
>             {
>               fprintf (f, " %s:%d", reg_class_names[rclass],
>                        COSTS (costs, i)->cost[k]);
> --- 1416,1422 ----
>         {
>           rclass = cost_classes[k];
>           if (contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (regno)]
> !             && ! invalid_mode_change_p (regno, (enum reg_class) rclass))
>             {
>               fprintf (f, " %s:%d", reg_class_names[rclass],
>                        COSTS (costs, i)->cost[k]);
> *************** print_pseudo_costs (FILE *f)
> *** 1458,1467 ****
>         {
>           rclass = cost_classes[k];
>           if (contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (regno)]
> ! #ifdef CANNOT_CHANGE_MODE_CLASS
> !             && ! invalid_mode_change_p (regno, (enum reg_class) rclass)
> ! #endif
> !             )
>             fprintf (f, " %s:%d", reg_class_names[rclass],
>                      COSTS (costs, regno)->cost[k]);
>         }
> --- 1455,1461 ----
>         {
>           rclass = cost_classes[k];
>           if (contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (regno)]
> !             && ! invalid_mode_change_p (regno, (enum reg_class) rclass))
>             fprintf (f, " %s:%d", reg_class_names[rclass],
>                      COSTS (costs, regno)->cost[k]);
>         }
> *************** find_costs_and_classes (FILE *dump_file)
> *** 1703,1712 ****
>               /* Ignore classes that are too small or invalid for this
>                  operand.  */
>               if (! contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (i)]
> ! #ifdef CANNOT_CHANGE_MODE_CLASS
> !                 || invalid_mode_change_p (i, (enum reg_class) rclass)
> ! #endif
> !                 )
>                 continue;
>               if (i_costs[k] < best_cost)
>                 {
> --- 1697,1703 ----
>               /* Ignore classes that are too small or invalid for this
>                  operand.  */
>               if (! contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (i)]
> !                 || invalid_mode_change_p (i, (enum reg_class) rclass))
>                 continue;
>               if (i_costs[k] < best_cost)
>                 {
> *************** find_costs_and_classes (FILE *dump_file)
> *** 1786,1795 ****
>                       /* Ignore classes that are too small or invalid
>                          for this operand.  */
>                       if (! contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (i)]
> ! #ifdef CANNOT_CHANGE_MODE_CLASS
> !                         || invalid_mode_change_p (i, (enum reg_class) rclass)
> ! #endif
> !                         )
>                         ;
>                       else if (total_a_costs[k] < best_cost)
>                         {
> --- 1777,1783 ----
>                       /* Ignore classes that are too small or invalid
>                          for this operand.  */
>                       if (! contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (i)]
> !                         || invalid_mode_change_p (i, (enum reg_class) rclass))
>                         ;
>                       else if (total_a_costs[k] < best_cost)
>                         {
> *** /tmp/MB8Pfk_reginfo.c       2014-03-11 07:34:37.995181423 +0000
> --- gcc/reginfo.c       2014-03-10 21:39:10.292726165 +0000
> *************** reg_classes_intersect_p (reg_class_t c1,
> *** 1191,1303 ****
>   }
> 
> 
> 
> ! /* Passes for keeping and updating info about modes of registers
> !    inside subregisters.  */
> 
> ! #ifdef CANNOT_CHANGE_MODE_CLASS
> 
> ! static bitmap invalid_mode_changes;
> 
> ! static void
> ! record_subregs_of_mode (rtx subreg, bitmap subregs_of_mode)
>   {
> !   enum machine_mode mode;
> !   unsigned int regno;
> 
>     if (!REG_P (SUBREG_REG (subreg)))
> !     return;
> !
> !   regno = REGNO (SUBREG_REG (subreg));
> !   mode = GET_MODE (subreg);
> 
>     if (regno < FIRST_PSEUDO_REGISTER)
> !     return;
> 
> !   if (bitmap_set_bit (subregs_of_mode,
> !                     regno * NUM_MACHINE_MODES + (unsigned int) mode))
>       {
> !       unsigned int rclass;
> !       for (rclass = 0; rclass < N_REG_CLASSES; rclass++)
> !       if (!bitmap_bit_p (invalid_mode_changes,
> !                          regno * N_REG_CLASSES + rclass)
> !           && CANNOT_CHANGE_MODE_CLASS (PSEUDO_REGNO_MODE (regno),
> !                                        mode, (enum reg_class) rclass))
> !         bitmap_set_bit (invalid_mode_changes,
> !                         regno * N_REG_CLASSES + rclass);
>       }
> - }
> 
> ! /* Call record_subregs_of_mode for all the subregs in X.  */
> ! static void
> ! find_subregs_of_mode (rtx x, bitmap subregs_of_mode)
> ! {
> !   enum rtx_code code = GET_CODE (x);
> !   const char * const fmt = GET_RTX_FORMAT (code);
> !   int i;
> 
> !   if (code == SUBREG)
> !     record_subregs_of_mode (x, subregs_of_mode);
> 
> !   /* Time for some deep diving.  */
> !   for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
> !     {
> !       if (fmt[i] == 'e')
> !       find_subregs_of_mode (XEXP (x, i), subregs_of_mode);
> !       else if (fmt[i] == 'E')
> !       {
> !         int j;
> !         for (j = XVECLEN (x, i) - 1; j >= 0; j--)
> !           find_subregs_of_mode (XVECEXP (x, i, j), subregs_of_mode);
> !       }
> !     }
>   }
> 
>   void
>   init_subregs_of_mode (void)
>   {
>     basic_block bb;
>     rtx insn;
> -   bitmap_obstack srom_obstack;
> -   bitmap subregs_of_mode;
> 
> !   gcc_assert (invalid_mode_changes == NULL);
> !   invalid_mode_changes = BITMAP_ALLOC (NULL);
> !   bitmap_obstack_initialize (&srom_obstack);
> !   subregs_of_mode = BITMAP_ALLOC (&srom_obstack);
> 
>     FOR_EACH_BB_FN (bb, cfun)
>       FOR_BB_INSNS (bb, insn)
>         if (NONDEBUG_INSN_P (insn))
> !         find_subregs_of_mode (PATTERN (insn), subregs_of_mode);
> !
> !   BITMAP_FREE (subregs_of_mode);
> !   bitmap_obstack_release (&srom_obstack);
>   }
> 
> ! /* Return 1 if REGNO has had an invalid mode change in CLASS from FROM
> !    mode.  */
>   bool
>   invalid_mode_change_p (unsigned int regno,
>                        enum reg_class rclass)
>   {
> !   return bitmap_bit_p (invalid_mode_changes,
> !                      regno * N_REG_CLASSES + (unsigned) rclass);
>   }
> 
>   void
>   finish_subregs_of_mode (void)
>   {
> !   BITMAP_FREE (invalid_mode_changes);
> ! }
> ! #else
> ! void
> ! init_subregs_of_mode (void)
> ! {
> ! }
> ! void
> ! finish_subregs_of_mode (void)
> ! {
>   }
> -
> - #endif /* CANNOT_CHANGE_MODE_CLASS */
> --- 1191,1337 ----
>   }
> 
> 
> + /* Information about one subreg of a pseudo register.  */
> + struct pseudo_subreg {
> +   /* The next subreg seen for this pseudo register.  */
> +   struct pseudo_subreg *next;
> 
> !   /* The mode of the subreg.  */
> !   enum machine_mode mode;
> 
> !   /* The byte offset of the subreg from the start of the pseudo register.  */
> !   unsigned int offset;
> ! };
> 
> ! /* Information about all subregs of a pseudo register.  */
> ! struct pseudo_subreg_list {
> !   /* The pseudo register that this structure describes.  */
> !   unsigned int regno;
> 
> !   /* A list of the individual subregs seen.  */
> !   struct pseudo_subreg *subregs;
> !
> !   /* The set of all hard registers that are invalid because of subregs
> !      listed in SUBREGS.  */
> !   HARD_REG_SET invalid_regs;
> ! };
> !
> ! /* Maps pseudo register numbers to the associated entry of pseudo_subreg_lists.
> !    Entries are initially undefined rather than zeroed; entry X is valid if:
> !
> !       pseudo_subreg_index[X] < pseudo_subreg_lists.length ()
> !       && pseudo_subreg_lists[pseudo_subreg_index[X]]->regno == X.  */
> ! static unsigned int *pseudo_subreg_index;
> !
> ! /* A list of all pseudo registers that have been seen in a subreg.  */
> ! static vec<struct pseudo_subreg_list *> pseudo_subreg_lists;
> !
> ! /* An obstack used to allocate the above.  */
> ! static struct obstack pseudo_subreg_obstack;
> !
> ! /* A for_each_rtx callback.  Record all subregs of pseudo registers.  */
> !
> ! static int
> ! record_subregs_of_mode (rtx *loc, void *)
>   {
> !   rtx subreg = *loc;
> !   if (GET_CODE (subreg) != SUBREG)
> !     return 0;
> 
>     if (!REG_P (SUBREG_REG (subreg)))
> !     return -1;
> 
> +   unsigned int regno = REGNO (SUBREG_REG (subreg));
>     if (regno < FIRST_PSEUDO_REGISTER)
> !     return -1;
> 
> !   /* If we haven't seem this pseudo before, create an entry for it.  */
> !   unsigned int index = pseudo_subreg_index[regno];
> !   if (index >= pseudo_subreg_lists.length ()
> !       || pseudo_subreg_lists[index]->regno != regno)
>       {
> !       struct pseudo_subreg_list *list = XOBNEW (&pseudo_subreg_obstack,
> !                                               struct pseudo_subreg_list);
> !       list->subregs = 0;
> !       CLEAR_HARD_REG_SET (list->invalid_regs);
> !       list->regno = regno;
> !
> !       index = pseudo_subreg_lists.length ();
> !       pseudo_subreg_lists.safe_push (list);
> !       pseudo_subreg_index[regno] = index;
>       }
> 
> !   /* Quick return if we've already seen a subreg with the same mode
> !      and offset.  In general the number of distinct subregs of a particular
> !      pseudo register should be small.  */
> !   struct pseudo_subreg_list *list = pseudo_subreg_lists[index];
> !   enum machine_mode outer_mode = GET_MODE (subreg);
> !   unsigned int offset = SUBREG_BYTE (subreg);
> !   for (struct pseudo_subreg *ps = list->subregs; ps; ps = ps->next)
> !     if (ps->mode == outer_mode && ps->offset == offset)
> !       return -1;
> 
> !   /* Record this mode and offset.  */
> !   struct pseudo_subreg *ps = XOBNEW (&pseudo_subreg_obstack,
> !                                    struct pseudo_subreg);
> !   ps->mode = outer_mode;
> !   ps->offset = offset;
> !   ps->next = list->subregs;
> !   list->subregs = ps;
> 
> !   /* Record a hard register as invalid if it wouldn't be possible for
> !      it to form the subreg.  */
> !   enum machine_mode inner_mode = GET_MODE (SUBREG_REG (subreg));
> !   for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
> !     if (!TEST_HARD_REG_BIT (list->invalid_regs, i)
> !       && (!HARD_REGNO_MODE_OK (i, inner_mode)
> !           || simplify_subreg_regno (i, inner_mode, offset, outer_mode) < 0))
> !       SET_HARD_REG_BIT (list->invalid_regs, i);
> !
> !   return -1;
>   }
> 
> + /* Scan the function and record any information needed by
> +    invalid_mode_change_p.  */
> +
>   void
>   init_subregs_of_mode (void)
>   {
>     basic_block bb;
>     rtx insn;
> 
> !   gcc_assert (!pseudo_subreg_index);
> !   unsigned int num_regs = max_reg_num ();
> !   pseudo_subreg_index = XNEWVEC (unsigned int, num_regs);
> !
> !   gcc_obstack_init (&pseudo_subreg_obstack);
> 
>     FOR_EACH_BB_FN (bb, cfun)
>       FOR_BB_INSNS (bb, insn)
>         if (NONDEBUG_INSN_P (insn))
> !       for_each_rtx (&PATTERN (insn), record_subregs_of_mode, 0);
>   }
> 
> ! /* Return true if REGNO is involved in a subreg that prevents it from using
> !    class RCLASS.  */
> !
>   bool
>   invalid_mode_change_p (unsigned int regno,
>                        enum reg_class rclass)
>   {
> !   unsigned int index = pseudo_subreg_index[regno];
> !   return (index < pseudo_subreg_lists.length ()
> !         && pseudo_subreg_lists[index]->regno == regno
> !         && hard_reg_set_subset_p (reg_class_contents[rclass],
> !                                   pseudo_subreg_lists[index]->invalid_regs));
>   }
> 
> + /* Release the structures created by init_subregs_of_mode.  */
> +
>   void
>   finish_subregs_of_mode (void)
>   {
> !   free (pseudo_subreg_index);
> !   pseudo_subreg_index = 0;
> !   pseudo_subreg_lists.release ();
>   }
> *** /tmp/3iptsc_rtl.h   2014-03-11 07:34:38.011189919 +0000
> --- gcc/rtl.h   2014-03-09 10:32:38.412143704 +0000
> *************** struct subreg_info
> *** 2128,2137 ****
>   {
>     /* Offset of first hard register involved in the subreg.  */
>     int offset;
> !   /* Number of hard registers involved in the subreg.  */
>     int nregs;
>     /* Whether this subreg can be represented as a hard reg with the new
> !      mode.  */
>     bool representable_p;
>   };
> 
> --- 2128,2140 ----
>   {
>     /* Offset of first hard register involved in the subreg.  */
>     int offset;
> !   /* Number of hard registers involved in the subreg.  In the case of
> !      a paradoxical subreg, this is the number of registers that would
> !      be modified by writing to the subreg; some of them may be don't-care
> !      when reading from the subreg.  */
>     int nregs;
>     /* Whether this subreg can be represented as a hard reg with the new
> !      mode (by adding OFFSET to the original hard register).  */
>     bool representable_p;
>   };
> 
> *** /tmp/jHXMdb_rtlanal.c       2014-03-11 07:34:38.026197136 +0000
> --- gcc/rtlanal.c       2014-03-09 10:33:39.645693024 +0000
> *************** subreg_lsb (const_rtx x)
> *** 3321,3327 ****
>      xmode  - The mode of xregno.
>      offset - The byte offset.
>      ymode  - The mode of a top level SUBREG (or what may become one).
> !    info   - Pointer to structure to fill in.  */
>   void
>   subreg_get_info (unsigned int xregno, enum machine_mode xmode,
>                  unsigned int offset, enum machine_mode ymode,
> --- 3321,3340 ----
>      xmode  - The mode of xregno.
>      offset - The byte offset.
>      ymode  - The mode of a top level SUBREG (or what may become one).
> !    info   - Pointer to structure to fill in.
> !
> !    Rather than considering one particular inner register (and thus one
> !    particular "outer" register) in isolation, this function really uses
> !    XREGNO as a model for a sequence of isomorphic hard registers.  Thus the
> !    function does not check whether adding INFO->offset to XREGNO gives
> !    a valid hard register; even if INFO->offset + XREGNO is out of range,
> !    there might be another register of the same type that is in range.
> !    Likewise it doesn't check whether HARD_REGNO_MODE_OK accepts the new
> !    register, since that can depend on things like whether the final
> !    register number is even or odd.  Callers that want to check whether
> !    this particular subreg can be replaced by a simple (reg ...) should
> !    use simplify_subreg_regno.  */
> !
>   void
>   subreg_get_info (unsigned int xregno, enum machine_mode xmode,
>                  unsigned int offset, enum machine_mode ymode,
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-04-22 13:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-10 19:21 [RFC, LRA] Incorrect subreg resolution? Tejas Belagod
2014-01-11 14:36 ` Richard Sandiford
2014-01-11 15:18   ` H.J. Lu
2014-01-12  9:05     ` Richard Sandiford
2014-03-11  7:57   ` Richard Sandiford
2014-04-22 13:54     ` Tejas Belagod

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).