Minor regression due to recent IRA changes

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Minor regression due to recent IRA changes
@ 2020-02-28 20:24 Jeff Law
  2020-02-29 15:47 ` Oleg Endo
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Law @ 2020-02-28 20:24 UTC (permalink / raw)
  To: vmakarov; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3296 bytes --]

This change:

> commit 3133bed5d0327e8a9cd0a601b7ecdb9de4fc825d
> Author: Vladimir N. Makarov <vmakarov@redhat.com>
> Date:   Sun Feb 23 16:20:05 2020 -0500
> 
>     Changing cost propagation and ordering colorable bucket heuristics for
> PR93564.
>     
>     2020-02-23  Vladimir Makarov  <vmakarov@redhat.com>
>     
>             PR rtl-optimization/93564
>             * ira-color.c (struct update_cost_queue_elem): New member start.
>             (queue_update_cost, get_next_update_cost): Add new arg start.
>             (allocnos_conflict_p): New function.
>             (update_costs_from_allocno): Add new arg conflict_cost_update_p.
>             Add checking conflicts with allocnos_conflict_p.
>             (update_costs_from_prefs, restore_costs_from_copies): Adjust
>             update_costs_from_allocno calls.
>             (update_conflict_hard_regno_costs): Add checking conflicts with
>             allocnos_conflict_p.  Adjust calls of queue_update_cost and
>             get_next_update_cost.
>             (assign_hard_reg): Adjust calls of queue_update_cost.  Add
>             debugging print.
>             (bucket_allocno_compare_func): Restore previous version.
> 

Is causing c-torture/compile/sync-1 to fail with an ICE on sh4eb (search for
"Tests that now fail, but worked before":


http://3.14.90.209:8080/job/sh4eb-linux-gnu/lastFailedBuild/console


In the .log we have:

> /home/gcc/gcc/gcc/testsuite/gcc.c-torture/compile/sync-1.c:253:1: error:
> unable to find a register to spill in class 'R0_REGS'^M
> /home/gcc/gcc/gcc/testsuite/gcc.c-torture/compile/sync-1.c:253:1: error: this
> is the insn:^M
> (insn 209 207 212 2 (parallel [^M
>             (set (subreg:SI (reg:HI 431) 0)^M
>                 (unspec_volatile:SI [^M
>                         (mem/v:HI (reg/f:SI 299) [-1  S2 A16])^M
>                         (subreg:HI (reg:SI 6 r6 [orig:425 uc+-3 ] [425]) 2)^M
>                         (reg:HI 5 r5 [orig:428 sc+-1 ] [428])^M
>                     ] UNSPECV_CMPXCHG_1))^M
>             (set (mem/v:HI (reg/f:SI 299) [-1  S2 A16])^M
>                 (unspec_volatile:HI [^M
>                         (const_int 0 [0])^M
>                     ] UNSPECV_CMPXCHG_2))^M
>             (set (reg:SI 147 t)^M
>                 (unspec_volatile:SI [^M
>                         (const_int 0 [0])^M
>                     ] UNSPECV_CMPXCHG_3))^M
>             (clobber (scratch:SI))^M
>             (clobber (reg:SI 0 r0))^M
>             (clobber (reg:SI 1 r1))^M
>         ]) "/home/gcc/gcc/gcc/testsuite/gcc.c-torture/compile/sync-1.c":245:8 
> 406 {atomic_compare_and_swaphi_soft_gusa}^M
>      (expr_list:REG_DEAD (reg:HI 5 r5 [orig:428 sc+-1 ] [428])^M
>         (expr_list:REG_DEAD (reg:SI 6 r6 [orig:425 uc+-3 ] [425])^M
>             (expr_list:REG_DEAD (reg/f:SI 299)^M
>                 (expr_list:REG_UNUSED (reg:HI 431)^M
>                     (expr_list:REG_UNUSED (reg:SI 1 r1)^M
>                         (expr_list:REG_UNUSED (reg:SI 0 r0)^M
>                             (nil))))))))^M
> 

You should be able to trigger it with a cross compiler at -O2 with the attached
testcase.

This could well be a target issue.  I haven't tried to debug it.  If it's a
target issue, I'm fully comfortable punting it to the SH folks for resolving.

[-- Attachment #2: j.c --]
[-- Type: text/x-csrc, Size: 9794 bytes --]

/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
/* { dg-message "note: '__sync_nand_and_fetch' changed semantics in GCC 4.4" "nand_and_fetch" { target *-*-* } 0 } */
/* { dg-options "-ffat-lto-objects" } */

/* Validate that each of the __sync builtins compiles.  This won't 
   necessarily link, since the target might not support the builtin,
   so this may result in external library calls.  */

signed char sc;
unsigned char uc;
signed short ss;
unsigned short us;
signed int si;
unsigned int ui;
signed long sl;
unsigned long ul;
signed long long sll;
unsigned long long ull;
void *vp;
int *ip;
struct S { struct S *next; int x; } *sp;

void test_op_ignore (void)
{
  (void) __sync_fetch_and_add (&sc, 1);
  (void) __sync_fetch_and_add (&uc, 1);
  (void) __sync_fetch_and_add (&ss, 1);
  (void) __sync_fetch_and_add (&us, 1);
  (void) __sync_fetch_and_add (&si, 1);
  (void) __sync_fetch_and_add (&ui, 1);
  (void) __sync_fetch_and_add (&sl, 1);
  (void) __sync_fetch_and_add (&ul, 1);
  (void) __sync_fetch_and_add (&sll, 1);
  (void) __sync_fetch_and_add (&ull, 1);

  (void) __sync_fetch_and_sub (&sc, 1);
  (void) __sync_fetch_and_sub (&uc, 1);
  (void) __sync_fetch_and_sub (&ss, 1);
  (void) __sync_fetch_and_sub (&us, 1);
  (void) __sync_fetch_and_sub (&si, 1);
  (void) __sync_fetch_and_sub (&ui, 1);
  (void) __sync_fetch_and_sub (&sl, 1);
  (void) __sync_fetch_and_sub (&ul, 1);
  (void) __sync_fetch_and_sub (&sll, 1);
  (void) __sync_fetch_and_sub (&ull, 1);

  (void) __sync_fetch_and_or (&sc, 1);
  (void) __sync_fetch_and_or (&uc, 1);
  (void) __sync_fetch_and_or (&ss, 1);
  (void) __sync_fetch_and_or (&us, 1);
  (void) __sync_fetch_and_or (&si, 1);
  (void) __sync_fetch_and_or (&ui, 1);
  (void) __sync_fetch_and_or (&sl, 1);
  (void) __sync_fetch_and_or (&ul, 1);
  (void) __sync_fetch_and_or (&sll, 1);
  (void) __sync_fetch_and_or (&ull, 1);

  (void) __sync_fetch_and_xor (&sc, 1);
  (void) __sync_fetch_and_xor (&uc, 1);
  (void) __sync_fetch_and_xor (&ss, 1);
  (void) __sync_fetch_and_xor (&us, 1);
  (void) __sync_fetch_and_xor (&si, 1);
  (void) __sync_fetch_and_xor (&ui, 1);
  (void) __sync_fetch_and_xor (&sl, 1);
  (void) __sync_fetch_and_xor (&ul, 1);
  (void) __sync_fetch_and_xor (&sll, 1);
  (void) __sync_fetch_and_xor (&ull, 1);

  (void) __sync_fetch_and_and (&sc, 1);
  (void) __sync_fetch_and_and (&uc, 1);
  (void) __sync_fetch_and_and (&ss, 1);
  (void) __sync_fetch_and_and (&us, 1);
  (void) __sync_fetch_and_and (&si, 1);
  (void) __sync_fetch_and_and (&ui, 1);
  (void) __sync_fetch_and_and (&sl, 1);
  (void) __sync_fetch_and_and (&ul, 1);
  (void) __sync_fetch_and_and (&sll, 1);
  (void) __sync_fetch_and_and (&ull, 1);

  (void) __sync_fetch_and_nand (&sc, 1);
  (void) __sync_fetch_and_nand (&uc, 1);
  (void) __sync_fetch_and_nand (&ss, 1);
  (void) __sync_fetch_and_nand (&us, 1);
  (void) __sync_fetch_and_nand (&si, 1);
  (void) __sync_fetch_and_nand (&ui, 1);
  (void) __sync_fetch_and_nand (&sl, 1);
  (void) __sync_fetch_and_nand (&ul, 1);
  (void) __sync_fetch_and_nand (&sll, 1);
  (void) __sync_fetch_and_nand (&ull, 1);
}

void test_fetch_and_op (void)
{
  sc = __sync_fetch_and_add (&sc, 11);
  uc = __sync_fetch_and_add (&uc, 11);
  ss = __sync_fetch_and_add (&ss, 11);
  us = __sync_fetch_and_add (&us, 11);
  si = __sync_fetch_and_add (&si, 11);
  ui = __sync_fetch_and_add (&ui, 11);
  sl = __sync_fetch_and_add (&sl, 11);
  ul = __sync_fetch_and_add (&ul, 11);
  sll = __sync_fetch_and_add (&sll, 11);
  ull = __sync_fetch_and_add (&ull, 11);

  sc = __sync_fetch_and_sub (&sc, 11);
  uc = __sync_fetch_and_sub (&uc, 11);
  ss = __sync_fetch_and_sub (&ss, 11);
  us = __sync_fetch_and_sub (&us, 11);
  si = __sync_fetch_and_sub (&si, 11);
  ui = __sync_fetch_and_sub (&ui, 11);
  sl = __sync_fetch_and_sub (&sl, 11);
  ul = __sync_fetch_and_sub (&ul, 11);
  sll = __sync_fetch_and_sub (&sll, 11);
  ull = __sync_fetch_and_sub (&ull, 11);

  sc = __sync_fetch_and_or (&sc, 11);
  uc = __sync_fetch_and_or (&uc, 11);
  ss = __sync_fetch_and_or (&ss, 11);
  us = __sync_fetch_and_or (&us, 11);
  si = __sync_fetch_and_or (&si, 11);
  ui = __sync_fetch_and_or (&ui, 11);
  sl = __sync_fetch_and_or (&sl, 11);
  ul = __sync_fetch_and_or (&ul, 11);
  sll = __sync_fetch_and_or (&sll, 11);
  ull = __sync_fetch_and_or (&ull, 11);

  sc = __sync_fetch_and_xor (&sc, 11);
  uc = __sync_fetch_and_xor (&uc, 11);
  ss = __sync_fetch_and_xor (&ss, 11);
  us = __sync_fetch_and_xor (&us, 11);
  si = __sync_fetch_and_xor (&si, 11);
  ui = __sync_fetch_and_xor (&ui, 11);
  sl = __sync_fetch_and_xor (&sl, 11);
  ul = __sync_fetch_and_xor (&ul, 11);
  sll = __sync_fetch_and_xor (&sll, 11);
  ull = __sync_fetch_and_xor (&ull, 11);

  sc = __sync_fetch_and_and (&sc, 11);
  uc = __sync_fetch_and_and (&uc, 11);
  ss = __sync_fetch_and_and (&ss, 11);
  us = __sync_fetch_and_and (&us, 11);
  si = __sync_fetch_and_and (&si, 11);
  ui = __sync_fetch_and_and (&ui, 11);
  sl = __sync_fetch_and_and (&sl, 11);
  ul = __sync_fetch_and_and (&ul, 11);
  sll = __sync_fetch_and_and (&sll, 11);
  ull = __sync_fetch_and_and (&ull, 11);

  sc = __sync_fetch_and_nand (&sc, 11);
  uc = __sync_fetch_and_nand (&uc, 11);
  ss = __sync_fetch_and_nand (&ss, 11);
  us = __sync_fetch_and_nand (&us, 11);
  si = __sync_fetch_and_nand (&si, 11);
  ui = __sync_fetch_and_nand (&ui, 11);
  sl = __sync_fetch_and_nand (&sl, 11);
  ul = __sync_fetch_and_nand (&ul, 11);
  sll = __sync_fetch_and_nand (&sll, 11);
  ull = __sync_fetch_and_nand (&ull, 11);
}

void test_op_and_fetch (void)
{
  sc = __sync_add_and_fetch (&sc, uc);
  uc = __sync_add_and_fetch (&uc, uc);
  ss = __sync_add_and_fetch (&ss, uc);
  us = __sync_add_and_fetch (&us, uc);
  si = __sync_add_and_fetch (&si, uc);
  ui = __sync_add_and_fetch (&ui, uc);
  sl = __sync_add_and_fetch (&sl, uc);
  ul = __sync_add_and_fetch (&ul, uc);
  sll = __sync_add_and_fetch (&sll, uc);
  ull = __sync_add_and_fetch (&ull, uc);

  sc = __sync_sub_and_fetch (&sc, uc);
  uc = __sync_sub_and_fetch (&uc, uc);
  ss = __sync_sub_and_fetch (&ss, uc);
  us = __sync_sub_and_fetch (&us, uc);
  si = __sync_sub_and_fetch (&si, uc);
  ui = __sync_sub_and_fetch (&ui, uc);
  sl = __sync_sub_and_fetch (&sl, uc);
  ul = __sync_sub_and_fetch (&ul, uc);
  sll = __sync_sub_and_fetch (&sll, uc);
  ull = __sync_sub_and_fetch (&ull, uc);

  sc = __sync_or_and_fetch (&sc, uc);
  uc = __sync_or_and_fetch (&uc, uc);
  ss = __sync_or_and_fetch (&ss, uc);
  us = __sync_or_and_fetch (&us, uc);
  si = __sync_or_and_fetch (&si, uc);
  ui = __sync_or_and_fetch (&ui, uc);
  sl = __sync_or_and_fetch (&sl, uc);
  ul = __sync_or_and_fetch (&ul, uc);
  sll = __sync_or_and_fetch (&sll, uc);
  ull = __sync_or_and_fetch (&ull, uc);

  sc = __sync_xor_and_fetch (&sc, uc);
  uc = __sync_xor_and_fetch (&uc, uc);
  ss = __sync_xor_and_fetch (&ss, uc);
  us = __sync_xor_and_fetch (&us, uc);
  si = __sync_xor_and_fetch (&si, uc);
  ui = __sync_xor_and_fetch (&ui, uc);
  sl = __sync_xor_and_fetch (&sl, uc);
  ul = __sync_xor_and_fetch (&ul, uc);
  sll = __sync_xor_and_fetch (&sll, uc);
  ull = __sync_xor_and_fetch (&ull, uc);

  sc = __sync_and_and_fetch (&sc, uc);
  uc = __sync_and_and_fetch (&uc, uc);
  ss = __sync_and_and_fetch (&ss, uc);
  us = __sync_and_and_fetch (&us, uc);
  si = __sync_and_and_fetch (&si, uc);
  ui = __sync_and_and_fetch (&ui, uc);
  sl = __sync_and_and_fetch (&sl, uc);
  ul = __sync_and_and_fetch (&ul, uc);
  sll = __sync_and_and_fetch (&sll, uc);
  ull = __sync_and_and_fetch (&ull, uc);

  sc = __sync_nand_and_fetch (&sc, uc);
  uc = __sync_nand_and_fetch (&uc, uc);
  ss = __sync_nand_and_fetch (&ss, uc);
  us = __sync_nand_and_fetch (&us, uc);
  si = __sync_nand_and_fetch (&si, uc);
  ui = __sync_nand_and_fetch (&ui, uc);
  sl = __sync_nand_and_fetch (&sl, uc);
  ul = __sync_nand_and_fetch (&ul, uc);
  sll = __sync_nand_and_fetch (&sll, uc);
  ull = __sync_nand_and_fetch (&ull, uc);
}

void test_compare_and_swap (void)
{
  sc = __sync_val_compare_and_swap (&sc, uc, sc);
  uc = __sync_val_compare_and_swap (&uc, uc, sc);
  ss = __sync_val_compare_and_swap (&ss, uc, sc);
  us = __sync_val_compare_and_swap (&us, uc, sc);
  si = __sync_val_compare_and_swap (&si, uc, sc);
  ui = __sync_val_compare_and_swap (&ui, uc, sc);
  sl = __sync_val_compare_and_swap (&sl, uc, sc);
  ul = __sync_val_compare_and_swap (&ul, uc, sc);
  sll = __sync_val_compare_and_swap (&sll, uc, sc);
  ull = __sync_val_compare_and_swap (&ull, uc, sc);

  ui = __sync_bool_compare_and_swap (&sc, uc, sc);
  ui = __sync_bool_compare_and_swap (&uc, uc, sc);
  ui = __sync_bool_compare_and_swap (&ss, uc, sc);
  ui = __sync_bool_compare_and_swap (&us, uc, sc);
  ui = __sync_bool_compare_and_swap (&si, uc, sc);
  ui = __sync_bool_compare_and_swap (&ui, uc, sc);
  ui = __sync_bool_compare_and_swap (&sl, uc, sc);
  ui = __sync_bool_compare_and_swap (&ul, uc, sc);
  ui = __sync_bool_compare_and_swap (&sll, uc, sc);
  ui = __sync_bool_compare_and_swap (&ull, uc, sc);
}

void test_lock (void)
{
  sc = __sync_lock_test_and_set (&sc, 1);
  uc = __sync_lock_test_and_set (&uc, 1);
  ss = __sync_lock_test_and_set (&ss, 1);
  us = __sync_lock_test_and_set (&us, 1);
  si = __sync_lock_test_and_set (&si, 1);
  ui = __sync_lock_test_and_set (&ui, 1);
  sl = __sync_lock_test_and_set (&sl, 1);
  ul = __sync_lock_test_and_set (&ul, 1);
  sll = __sync_lock_test_and_set (&sll, 1);
  ull = __sync_lock_test_and_set (&ull, 1);

  __sync_synchronize ();

  __sync_lock_release (&sc);
  __sync_lock_release (&uc);
  __sync_lock_release (&ss);
  __sync_lock_release (&us);
  __sync_lock_release (&si);
  __sync_lock_release (&ui);
  __sync_lock_release (&sl);
  __sync_lock_release (&ul);
  __sync_lock_release (&sll);
  __sync_lock_release (&ull);
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Minor regression due to recent IRA changes
  2020-02-28 20:24 Minor regression due to recent IRA changes Jeff Law
@ 2020-02-29 15:47 ` Oleg Endo
  2020-02-29 15:55   ` Jeff Law
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Endo @ 2020-02-29 15:47 UTC (permalink / raw)
  To: law, vmakarov; +Cc: gcc-patches

On Fri, 2020-02-28 at 13:24 -0700, Jeff Law wrote:
> This change:
> 
> > commit 3133bed5d0327e8a9cd0a601b7ecdb9de4fc825d
> > Author: Vladimir N. Makarov <vmakarov@redhat.com>
> > Date:   Sun Feb 23 16:20:05 2020 -0500
> > 
> >     Changing cost propagation and ordering colorable bucket
> > heuristics for
> > PR93564.
> >     
> >     2020-02-23  Vladimir Makarov  <vmakarov@redhat.com>
> >     
> >             PR rtl-optimization/93564
> >             * ira-color.c (struct update_cost_queue_elem): New
> > member start.
> >             (queue_update_cost, get_next_update_cost): Add new arg
> > start.
> >             (allocnos_conflict_p): New function.
> >             (update_costs_from_allocno): Add new arg
> > conflict_cost_update_p.
> >             Add checking conflicts with allocnos_conflict_p.
> >             (update_costs_from_prefs, restore_costs_from_copies):
> > Adjust
> >             update_costs_from_allocno calls.
> >             (update_conflict_hard_regno_costs): Add checking
> > conflicts with
> >             allocnos_conflict_p.  Adjust calls of queue_update_cost
> > and
> >             get_next_update_cost.
> >             (assign_hard_reg): Adjust calls of
> > queue_update_cost.  Add
> >             debugging print.
> >             (bucket_allocno_compare_func): Restore previous
> > version.
> > 
> 
> Is causing c-torture/compile/sync-1 to fail with an ICE on sh4eb
> (search for
> "Tests that now fail, but worked before":
> 
> 
> http://3.14.90.209:8080/job/sh4eb-linux-gnu/lastFailedBuild/console
> 
> 
> In the .log we have:
> 
> > /home/gcc/gcc/gcc/testsuite/gcc.c-torture/compile/sync-1.c:253:1:
> > error:
> > unable to find a register to spill in class 'R0_REGS'^M
> > /home/gcc/gcc/gcc/testsuite/gcc.c-torture/compile/sync-1.c:253:1:
> > error: this
> > is the insn:^M
> > (insn 209 207 212 2 (parallel [^M
> >             (set (subreg:SI (reg:HI 431) 0)^M
> >                 (unspec_volatile:SI [^M
> >                         (mem/v:HI (reg/f:SI 299) [-1  S2 A16])^M
> >                         (subreg:HI (reg:SI 6 r6 [orig:425 uc+-3 ]
> > [425]) 2)^M
> >                         (reg:HI 5 r5 [orig:428 sc+-1 ] [428])^M
> >                     ] UNSPECV_CMPXCHG_1))^M
> >             (set (mem/v:HI (reg/f:SI 299) [-1  S2 A16])^M
> >                 (unspec_volatile:HI [^M
> >                         (const_int 0 [0])^M
> >                     ] UNSPECV_CMPXCHG_2))^M
> >             (set (reg:SI 147 t)^M
> >                 (unspec_volatile:SI [^M
> >                         (const_int 0 [0])^M
> >                     ] UNSPECV_CMPXCHG_3))^M
> >             (clobber (scratch:SI))^M
> >             (clobber (reg:SI 0 r0))^M
> >             (clobber (reg:SI 1 r1))^M
> >         ]) "/home/gcc/gcc/gcc/testsuite/gcc.c-torture/compile/sync-
> > 1.c":245:8 
> > 406 {atomic_compare_and_swaphi_soft_gusa}^M
> >      (expr_list:REG_DEAD (reg:HI 5 r5 [orig:428 sc+-1 ] [428])^M
> >         (expr_list:REG_DEAD (reg:SI 6 r6 [orig:425 uc+-3 ] [425])^M
> >             (expr_list:REG_DEAD (reg/f:SI 299)^M
> >                 (expr_list:REG_UNUSED (reg:HI 431)^M
> >                     (expr_list:REG_UNUSED (reg:SI 1 r1)^M
> >                         (expr_list:REG_UNUSED (reg:SI 0 r0)^M
> >                             (nil))))))))^M
> > 
> 
> You should be able to trigger it with a cross compiler at -O2 with
> the attached
> testcase.
> 
> This could well be a target issue.  I haven't tried to debug it.  If
> it's a
> target issue, I'm fully comfortable punting it to the SH folks for
> resolving.

The R0_REGS spill failure is a general problem, in particular with old
reload.  The atomic patterns tend to trigger it in one circumstance or
the other.  The IRA change probably just stresses it more.  Perhaps it
will go away with -mlra.

However, LRA on SH still has its own issues, so it can't be generally
enabled by default yet, unfortunately.  See also some of the recent
posts in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93877

Cheers,
Oleg

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Minor regression due to recent IRA changes
  2020-02-29 15:47 ` Oleg Endo
@ 2020-02-29 15:55   ` Jeff Law
  2020-02-29 15:57     ` Oleg Endo
  2020-03-02 16:16     ` Vladimir Makarov
  0 siblings, 2 replies; 16+ messages in thread
From: Jeff Law @ 2020-02-29 15:55 UTC (permalink / raw)
  To: Oleg Endo, vmakarov; +Cc: gcc-patches

On Sun, 2020-03-01 at 00:43 +0900, Oleg Endo wrote:
> 
> > This could well be a target issue.  I haven't tried to debug it.  If
> > it's a
> > target issue, I'm fully comfortable punting it to the SH folks for
> > resolving.
> 
> The R0_REGS spill failure is a general problem, in particular with old
> reload.  The atomic patterns tend to trigger it in one circumstance or
> the other.  The IRA change probably just stresses it more.  Perhaps it
> will go away with -mlra.
> 
> However, LRA on SH still has its own issues, so it can't be generally
> enabled by default yet, unfortunately.  See also some of the recent
> posts in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93877
It's almost certainly the case that the recent IRA changes are going to stress
R0 more.  If I'm reading what Vlad did correctly, one of the tie-breakers its
using now is to choose the lowest numbered register when all else is equal.  So
R0 on SH is likely going to be more problematical.

I wonder if just reordering the regs on the SH (and adjusting the debug output
to keep that working) would be enough to mitigate some of the R0 problems.

And yes, I saw 93877 fly by too :(

Jeff


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Minor regression due to recent IRA changes
  2020-02-29 15:55   ` Jeff Law
@ 2020-02-29 15:57     ` Oleg Endo
  2020-02-29 16:01       ` Jeff Law
  2020-03-02 16:16     ` Vladimir Makarov
  1 sibling, 1 reply; 16+ messages in thread
From: Oleg Endo @ 2020-02-29 15:57 UTC (permalink / raw)
  To: law, vmakarov; +Cc: gcc-patches

On Sat, 2020-02-29 at 08:47 -0700, Jeff Law wrote:
> 
> It's almost certainly the case that the recent IRA changes are going to stress
> R0 more.  If I'm reading what Vlad did correctly, one of the tie-breakers its
> using now is to choose the lowest numbered register when all else is equal.  So
> R0 on SH is likely going to be more problematical.
> 
> I wonder if just reordering the regs on the SH (and adjusting the debug output
> to keep that working) would be enough to mitigate some of the R0 problems.

It could open a can of worms.  Off the top of my head, R0 is used to
hold the function return value, and R0:R1 to return structs with sizeof
> 4 bytes.  So if DImode is allocated to R0, it implicitly uses R0:R1,
AFAIR, doesn't it?  Would that kind of thing cause troubles?

Cheers,
Oleg

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Minor regression due to recent IRA changes
  2020-02-29 15:57     ` Oleg Endo
@ 2020-02-29 16:01       ` Jeff Law
  2020-02-29 16:30         ` Oleg Endo
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Law @ 2020-02-29 16:01 UTC (permalink / raw)
  To: Oleg Endo, vmakarov; +Cc: gcc-patches

On Sun, 2020-03-01 at 00:55 +0900, Oleg Endo wrote:
> On Sat, 2020-02-29 at 08:47 -0700, Jeff Law wrote:
> > It's almost certainly the case that the recent IRA changes are going to
> > stress
> > R0 more.  If I'm reading what Vlad did correctly, one of the tie-breakers
> > its
> > using now is to choose the lowest numbered register when all else is
> > equal.  So
> > R0 on SH is likely going to be more problematical.
> > 
> > I wonder if just reordering the regs on the SH (and adjusting the debug
> > output
> > to keep that working) would be enough to mitigate some of the R0 problems.
> 
> It could open a can of worms.  Off the top of my head, R0 is used to
> hold the function return value, and R0:R1 to return structs with sizeof
> > 4 bytes.  So if DImode is allocated to R0, it implicitly uses R0:R1,
> AFAIR, doesn't it?  Would that kind of thing cause troubles?
It might.  We might have to move a pair or even a quad if you have modes that
cover r0-r3. It may not be feasible in practice.  I was just thinking off the
top of my head.

jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Minor regression due to recent IRA changes
  2020-02-29 16:01       ` Jeff Law
@ 2020-02-29 16:30         ` Oleg Endo
  2020-02-29 16:47           ` Jeff Law
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Endo @ 2020-02-29 16:30 UTC (permalink / raw)
  To: law, vmakarov; +Cc: gcc-patches

On Sat, 2020-02-29 at 08:57 -0700, Jeff Law wrote:
> 
> > It could open a can of worms.  Off the top of my head, R0 is used to
> > hold the function return value, and R0:R1 to return structs with sizeof
> > > 4 bytes.  So if DImode is allocated to R0, it implicitly uses R0:R1,
> > 
> > AFAIR, doesn't it?  Would that kind of thing cause troubles?
> 
> It might.  We might have to move a pair or even a quad if you have modes that
> cover r0-r3. It may not be feasible in practice.  I was just thinking off the
> top of my head.
> 

Yeah, for instance 'double _Complex' will be returned in R0-R3 when
compiling for 'without FPU'.  How about adding a target hook or look-up 
table (default 1:1 mapping for other targets)?  Would that be an
option?

Cheers,
Oleg

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Minor regression due to recent IRA changes
  2020-02-29 16:30         ` Oleg Endo
@ 2020-02-29 16:47           ` Jeff Law
  2020-02-29 17:04             ` Oleg Endo
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Law @ 2020-02-29 16:47 UTC (permalink / raw)
  To: Oleg Endo, vmakarov; +Cc: gcc-patches

On Sun, 2020-03-01 at 01:06 +0900, Oleg Endo wrote:
> On Sat, 2020-02-29 at 08:57 -0700, Jeff Law wrote:
> > > It could open a can of worms.  Off the top of my head, R0 is used to
> > > hold the function return value, and R0:R1 to return structs with sizeof
> > > > 4 bytes.  So if DImode is allocated to R0, it implicitly uses R0:R1,
> > > 
> > > AFAIR, doesn't it?  Would that kind of thing cause troubles?
> > 
> > It might.  We might have to move a pair or even a quad if you have modes
> > that
> > cover r0-r3. It may not be feasible in practice.  I was just thinking off
> > the
> > top of my head.
> > 
> 
> Yeah, for instance 'double _Complex' will be returned in R0-R3 when
> compiling for 'without FPU'.  How about adding a target hook or look-up 
> table (default 1:1 mapping for other targets)?  Would that be an
> option?
I think it's pretty deeply baked that we can iterate from the first register in
a group to the last.  Given we'd have to move quads, I suspect this isn't
feasible in practice.

It really would have just been a workaround for some of the R0 issues anyway. 
I think at its core R0 on the SH probably needs to be treated more like a
temporary rather than a general register.  But that's probably a huge change,
both in terms of just getting it working right and in terms of addressing the
code quality regressions that would introduce.

jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Minor regression due to recent IRA changes
  2020-02-29 16:47           ` Jeff Law
@ 2020-02-29 17:04             ` Oleg Endo
  2020-02-29 20:05               ` Jeff Law
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Endo @ 2020-02-29 17:04 UTC (permalink / raw)
  To: law, vmakarov; +Cc: gcc-patches

On Sat, 2020-02-29 at 09:38 -0700, Jeff Law wrote:
> 
> It really would have just been a workaround for some of the R0 issues anyway. 
> I think at its core R0 on the SH probably needs to be treated more like a
> temporary rather than a general register.  But that's probably a huge change,
> both in terms of just getting it working right and in terms of addressing the
> code quality regressions that would introduce.
> 

I think one of the major issues is that R0 is a constraint in several
addressing modes for memory accesses.  I believe I once had the idea of
hiding R0 from RA ... then insert reg-reg copies (to load R0) after
RA/reload ... and then somehow do back propagation to get rid of the
reg-reg copies again.  Another idea was to run a pre-RA pass to pre-
allocate all R0 things.  But I think it's all just running in sqrt(1)
circles after all.

Cheers,
Oleg

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Minor regression due to recent IRA changes
  2020-02-29 17:04             ` Oleg Endo
@ 2020-02-29 20:05               ` Jeff Law
  2020-03-01  1:37                 ` Oleg Endo
  2020-03-02  9:34                 ` Richard Biener
  0 siblings, 2 replies; 16+ messages in thread
From: Jeff Law @ 2020-02-29 20:05 UTC (permalink / raw)
  To: Oleg Endo, vmakarov; +Cc: gcc-patches

On Sun, 2020-03-01 at 01:47 +0900, Oleg Endo wrote:
> On Sat, 2020-02-29 at 09:38 -0700, Jeff Law wrote:
> > It really would have just been a workaround for some of the R0 issues
> > anyway. 
> > I think at its core R0 on the SH probably needs to be treated more like a
> > temporary rather than a general register.  But that's probably a huge
> > change,
> > both in terms of just getting it working right and in terms of addressing
> > the
> > code quality regressions that would introduce.
> > 
> 
> I think one of the major issues is that R0 is a constraint in several
> addressing modes for memory accesses.  I believe I once had the idea of
> hiding R0 from RA ... then insert reg-reg copies (to load R0) after
> RA/reload ... and then somehow do back propagation to get rid of the
> reg-reg copies again.  Another idea was to run a pre-RA pass to pre-
> allocate all R0 things.  But I think it's all just running in sqrt(1)
> circles after all.
Yup.  That was roughly what I was thinking and roughly the worry I had with
trying to squash out the quality regressions.  But it may ultimately be the
only way to really resolve these issues.

DJ's work on the m32c IIRC might be useful if you do try to chase this stuff
down.  Essentially there weren't really enough registers.  So he had the port
pretend to have more than it really did, then had a post-reload pass to do the
final allocation into the target's actual register file.

jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Minor regression due to recent IRA changes
  2020-02-29 20:05               ` Jeff Law
@ 2020-03-01  1:37                 ` Oleg Endo
  2020-03-05 15:51                   ` Jeff Law
  2020-03-02  9:34                 ` Richard Biener
  1 sibling, 1 reply; 16+ messages in thread
From: Oleg Endo @ 2020-03-01  1:37 UTC (permalink / raw)
  To: law, vmakarov; +Cc: gcc-patches

On Sat, 2020-02-29 at 12:35 -0700, Jeff Law wrote:
> 
> Yup.  That was roughly what I was thinking and roughly the worry I had with
> trying to squash out the quality regressions.  But it may ultimately be the
> only way to really resolve these issues.

Another idea would be to let RA see R0, but ignore all the R0
constraints.  Then try fixing up everything afterwards.  If R0 is
removed from the allocatable reg list, there will be one register less
for it to work with and I'd expect some code quality regressions.  But
in order to fix up all the R0 cases after the regular RA/reload, I
believe it will have to re-do a lot of (similar) work that has been
done by the regular RA already.  One thing that comes instantly to mind
are loops and the use of R0 as index/base register in memory addressing
... it just sounds like a lot of duplicate work in general.

> 
> DJ's work on the m32c IIRC might be useful if you do try to chase this stuff
> down.  Essentially there weren't really enough registers.  So he had the port
> pretend to have more than it really did, then had a post-reload pass to do the
> final allocation into the target's actual register file.
> 

AFAIK DJ did the same (or similar) thing for RL78.  IMHO that just
shows that one type of RA/reload does not fit all.  Perhaps it'd be
better to have the option of different RA/reload implementations, which
implement different strategies for different needs and priorities.

Anyway, on SH the R0 problem seems to go away with LRA for the most
part.  I don't know if anything has been put in LRA specifically to
address such cases, or it works by general definition of the design, or
it's just a mere coincidence.  If it's the latter case, I'm not sure
what to expect in the future.  Perhaps it will start breaking again if
changes for other targets are being made to LRA.

Cheers,
Oleg

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Minor regression due to recent IRA changes
  2020-02-29 20:05               ` Jeff Law
  2020-03-01  1:37                 ` Oleg Endo
@ 2020-03-02  9:34                 ` Richard Biener
  2020-03-02 16:21                   ` Vladimir Makarov
  1 sibling, 1 reply; 16+ messages in thread
From: Richard Biener @ 2020-03-02  9:34 UTC (permalink / raw)
  To: Jeff Law; +Cc: Oleg Endo, Vladimir N. Makarov, GCC Patches

On Sat, Feb 29, 2020 at 8:35 PM Jeff Law <law@redhat.com> wrote:
>
> On Sun, 2020-03-01 at 01:47 +0900, Oleg Endo wrote:
> > On Sat, 2020-02-29 at 09:38 -0700, Jeff Law wrote:
> > > It really would have just been a workaround for some of the R0 issues
> > > anyway.
> > > I think at its core R0 on the SH probably needs to be treated more like a
> > > temporary rather than a general register.  But that's probably a huge
> > > change,
> > > both in terms of just getting it working right and in terms of addressing
> > > the
> > > code quality regressions that would introduce.
> > >
> >
> > I think one of the major issues is that R0 is a constraint in several
> > addressing modes for memory accesses.  I believe I once had the idea of
> > hiding R0 from RA ... then insert reg-reg copies (to load R0) after
> > RA/reload ... and then somehow do back propagation to get rid of the
> > reg-reg copies again.  Another idea was to run a pre-RA pass to pre-
> > allocate all R0 things.  But I think it's all just running in sqrt(1)
> > circles after all.
> Yup.  That was roughly what I was thinking and roughly the worry I had with
> trying to squash out the quality regressions.  But it may ultimately be the
> only way to really resolve these issues.

One could also simply pessimize R0 for RA via either an existing mechanism
or a new target hook ...

> DJ's work on the m32c IIRC might be useful if you do try to chase this stuff
> down.  Essentially there weren't really enough registers.  So he had the port
> pretend to have more than it really did, then had a post-reload pass to do the
> final allocation into the target's actual register file.
>
> jeff
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Minor regression due to recent IRA changes
  2020-02-29 15:55   ` Jeff Law
  2020-02-29 15:57     ` Oleg Endo
@ 2020-03-02 16:16     ` Vladimir Makarov
  1 sibling, 0 replies; 16+ messages in thread
From: Vladimir Makarov @ 2020-03-02 16:16 UTC (permalink / raw)
  To: law, Oleg Endo; +Cc: gcc-patches


On 2020-02-29 10:47 a.m., Jeff Law wrote:
> On Sun, 2020-03-01 at 00:43 +0900, Oleg Endo wrote:
>>> This could well be a target issue.  I haven't tried to debug it.  If
>>> it's a
>>> target issue, I'm fully comfortable punting it to the SH folks for
>>> resolving.
>> The R0_REGS spill failure is a general problem, in particular with old
>> reload.  The atomic patterns tend to trigger it in one circumstance or
>> the other.  The IRA change probably just stresses it more.  Perhaps it
>> will go away with -mlra.
>>
>> However, LRA on SH still has its own issues, so it can't be generally
>> enabled by default yet, unfortunately.  See also some of the recent
>> posts in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93877
> It's almost certainly the case that the recent IRA changes are going to stress
> R0 more.  If I'm reading what Vlad did correctly, one of the tie-breakers its
> using now is to choose the lowest numbered register when all else is equal.  So
> R0 on SH is likely going to be more problematical.
The last patch does not do it for targets requiring to honor reg 
allocation order.  I'd recommend to try to define macro 
HONOR_REG_ALLOC_ORDER for sh.
> I wonder if just reordering the regs on the SH (and adjusting the debug output
> to keep that working) would be enough to mitigate some of the R0 problems.
>
> And yes, I saw 93877 fly by too :(

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Minor regression due to recent IRA changes
  2020-03-02  9:34                 ` Richard Biener
@ 2020-03-02 16:21                   ` Vladimir Makarov
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Makarov @ 2020-03-02 16:21 UTC (permalink / raw)
  To: Richard Biener, Jeff Law; +Cc: Oleg Endo, GCC Patches


On 2020-03-02 4:34 a.m., Richard Biener wrote:
>
> One could also simply pessimize R0 for RA via either an existing mechanism
> or a new target hook ...
>
>
Yes, it could be a good strategy.  I'd recommend to try 
HONOR_REG_ALLOC_ORDER first with/without LRA.

If it does not work I am ready to accept a reasonable new hook for IRA 
and/or LRA.  Some GCC targets are to specific and require a special 
treatment.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Minor regression due to recent IRA changes
  2020-03-01  1:37                 ` Oleg Endo
@ 2020-03-05 15:51                   ` Jeff Law
  2020-03-08  6:53                     ` Oleg Endo
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Law @ 2020-03-05 15:51 UTC (permalink / raw)
  To: Oleg Endo, vmakarov; +Cc: gcc-patches

On Sun, 2020-03-01 at 10:37 +0900, Oleg Endo wrote:
> On Sat, 2020-02-29 at 12:35 -0700, Jeff Law wrote:
> > Yup.  That was roughly what I was thinking and roughly the worry I had with
> > trying to squash out the quality regressions.  But it may ultimately be the
> > only way to really resolve these issues.
> 
> Another idea would be to let RA see R0, but ignore all the R0
> constraints.  Then try fixing up everything afterwards.  If R0 is
> removed from the allocatable reg list, there will be one register less
> for it to work with and I'd expect some code quality regressions.  But
> in order to fix up all the R0 cases after the regular RA/reload, I
> believe it will have to re-do a lot of (similar) work that has been
> done by the regular RA already.  One thing that comes instantly to mind
> are loops and the use of R0 as index/base register in memory addressing
> ... it just sounds like a lot of duplicate work in general.
> 
> > DJ's work on the m32c IIRC might be useful if you do try to chase this stuff
> > down.  Essentially there weren't really enough registers.  So he had the port
> > pretend to have more than it really did, then had a post-reload pass to do
> > the
> > final allocation into the target's actual register file.
> > 
> 
> AFAIK DJ did the same (or similar) thing for RL78.  IMHO that just
> shows that one type of RA/reload does not fit all.  Perhaps it'd be
> better to have the option of different RA/reload implementations, which
> implement different strategies for different needs and priorities.
> 
> Anyway, on SH the R0 problem seems to go away with LRA for the most
> part.  I don't know if anything has been put in LRA specifically to
> address such cases, or it works by general definition of the design, or
> it's just a mere coincidence.  If it's the latter case, I'm not sure
> what to expect in the future.  Perhaps it will start breaking again if
> changes for other targets are being made to LRA.
FWIW I've got an sh4/sh4eb bootstrap and regression test running with
HONOR_REG_ALLOC_ORDER defined.  As Vlad mentioned, that may be a viable
workaround.

Jeff
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Minor regression due to recent IRA changes
  2020-03-05 15:51                   ` Jeff Law
@ 2020-03-08  6:53                     ` Oleg Endo
  2020-03-08 16:19                       ` Jeff Law
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Endo @ 2020-03-08  6:53 UTC (permalink / raw)
  To: law, vmakarov; +Cc: gcc-patches

On Thu, 2020-03-05 at 08:51 -0700, Jeff Law wrote:
> 
> FWIW I've got an sh4/sh4eb bootstrap and regression test running with
> HONOR_REG_ALLOC_ORDER defined.  As Vlad mentioned, that may be a
> viable workaround.
> 

I've had a look at the good old CSiBE code size results and poked at
some of the cases.  Overall, it seems to help code quality when
HONOR_REG_ALLOC_ORDER is defined on SH.

sum:  3383449 -> 3379629    -3820 / -0.112903 %
avg: -212.222222 / -0.271573 %
max: flex-2.5.31  253514 -> 253718        +204 / +0.080469 %
min: bzip2-1.0.2   67202 -> 65938        -1264 / -1.880896 %

However, even with HONOR_REG_ALLOC_ORDER defined, the simple test case
from PR 81426 https://gcc.gnu.org/bugzilla/attachment.cgi?id=47159
fails to compile without -mlra (use options -m4 -matomic-model=soft-gusa on regular non-linux sh-elf cross compiler).

How about the bootstrap, Jeff?  Did it help anything?

Cheers,
Oleg

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Minor regression due to recent IRA changes
  2020-03-08  6:53                     ` Oleg Endo
@ 2020-03-08 16:19                       ` Jeff Law
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Law @ 2020-03-08 16:19 UTC (permalink / raw)
  To: Oleg Endo, vmakarov; +Cc: gcc-patches

On Sun, 2020-03-08 at 15:53 +0900, Oleg Endo wrote:
> On Thu, 2020-03-05 at 08:51 -0700, Jeff Law wrote:
> > FWIW I've got an sh4/sh4eb bootstrap and regression test running with
> > HONOR_REG_ALLOC_ORDER defined.  As Vlad mentioned, that may be a
> > viable workaround.
> > 
> 
> I've had a look at the good old CSiBE code size results and poked at
> some of the cases.  Overall, it seems to help code quality when
> HONOR_REG_ALLOC_ORDER is defined on SH.
> 
> sum:  3383449 -> 3379629    -3820 / -0.112903 %
> avg: -212.222222 / -0.271573 %
> max: flex-2.5.31  253514 -> 253718        +204 / +0.080469 %
> min: bzip2-1.0.2   67202 -> 65938        -1264 / -1.880896 %
> 
> 
> However, even with HONOR_REG_ALLOC_ORDER defined, the simple test case
> from PR 81426 https://gcc.gnu.org/bugzilla/attachment.cgi?id=47159
> fails to compile without -mlra (use options -m4 -matomic-model=soft-gusa on
> regular non-linux sh-elf cross compiler).
> 
> How about the bootstrap, Jeff?  Did it help anything?
Bootstrapped just fine. It neither regressed nor fixed anything in the GCC
testsuite -- so it doesn't appear to address the c-torture/compile/sync-1
regression.

Jeff


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-03-08 16:19 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-28 20:24 Minor regression due to recent IRA changes Jeff Law
2020-02-29 15:47 ` Oleg Endo
2020-02-29 15:55   ` Jeff Law
2020-02-29 15:57     ` Oleg Endo
2020-02-29 16:01       ` Jeff Law
2020-02-29 16:30         ` Oleg Endo
2020-02-29 16:47           ` Jeff Law
2020-02-29 17:04             ` Oleg Endo
2020-02-29 20:05               ` Jeff Law
2020-03-01  1:37                 ` Oleg Endo
2020-03-05 15:51                   ` Jeff Law
2020-03-08  6:53                     ` Oleg Endo
2020-03-08 16:19                       ` Jeff Law
2020-03-02  9:34                 ` Richard Biener
2020-03-02 16:21                   ` Vladimir Makarov
2020-03-02 16:16     ` Vladimir Makarov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).