public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/106415] New: loop-ivopts prevents correct usage of dbra with 16-bit loop counters on m68k
@ 2022-07-22 17:32 undefinedopcode2 at gmail dot com
  2022-07-22 18:50 ` [Bug tree-optimization/106415] " undefinedopcode2 at gmail dot com
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: undefinedopcode2 at gmail dot com @ 2022-07-22 17:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106415

            Bug ID: 106415
           Summary: loop-ivopts prevents correct usage of dbra with 16-bit
                    loop counters on m68k
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: undefinedopcode2 at gmail dot com
  Target Milestone: ---

Created attachment 53338
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53338&action=edit
C file that reproduces the problem.

When targeting m68k and compiling certain loops with 16-bit counters that
should trivially generate a DBRA instruction, GCC's optimization passes end up
converting the IV to 32-bit, which  requires extra logic to check the upper
half. More specifically, these are loops where the number of iterations is
known at compile time.

This additional code is completely useless since we know the loop count fits in
16 bits.

I am using GCC 11.2.0 hosted on ARM64 macOS and targeting m68k. All code
snippets were compiled with `-O3 -std=c99 -march=68000 -mtune=68000`.

Consider the following function:
void dbra_test1(short i) {
    do {
        foo(i);
    } while(--i != -1);
}

As expected, the generated body is a tiny loop consisting solely of call setup,
the call itself, call cleanup, and a DBRA:
.L2:
        movew %d2,%a0
        movel %a0,%sp@-
        jsr %a2@
        addql #4,%sp
        dbra %d2,.L2

Now consider this function, where we change the initial value of the loop count
to be a constant:
void dbra_test2(void) {
    short i = 15;
    do {
        foo(i);
    } while(--i != -1);
}

GCC generates the following code for the body of the loop:
.L7:
        movel %d2,%sp@-
        jsr %a2@
        addql #4,%sp
        dbra %d2,.L7
        clrw %d2
        subql #1,%d2
        jcc .L7

Note the extraneous clr/subq/jcc.

During ivcanon, GCC transforms the second loop to run from 16 to 0 instead of
15 to -1. Later during ivopts, it transforms back into 15 to -1 form, but
promotes the variable from short to int. Future transformations are no longer
able to optimize around the short variable, and we end up with extraneous
checks inserted during codegen.

I've attached a simple file that reproduces the problem. GCC 2.95.3 performed
the operation correctly, but it's been broken since at least 4.3.2, possibly
earlier.

Thanks
--UD2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/106415] loop-ivopts prevents correct usage of dbra with 16-bit loop counters on m68k
  2022-07-22 17:32 [Bug tree-optimization/106415] New: loop-ivopts prevents correct usage of dbra with 16-bit loop counters on m68k undefinedopcode2 at gmail dot com
@ 2022-07-22 18:50 ` undefinedopcode2 at gmail dot com
  2022-07-25 16:51 ` [Bug target/106415] " pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: undefinedopcode2 at gmail dot com @ 2022-07-22 18:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106415

--- Comment #1 from Undefined Opcode <undefinedopcode2 at gmail dot com> ---
Adding `-fno-tree-loop-ivcanon -fno-ivopts` to the compiler flags ensures the
second example function gets a properly optimized DBRA loop.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/106415] loop-ivopts prevents correct usage of dbra with 16-bit loop counters on m68k
  2022-07-22 17:32 [Bug tree-optimization/106415] New: loop-ivopts prevents correct usage of dbra with 16-bit loop counters on m68k undefinedopcode2 at gmail dot com
  2022-07-22 18:50 ` [Bug tree-optimization/106415] " undefinedopcode2 at gmail dot com
@ 2022-07-25 16:51 ` pinskia at gcc dot gnu.org
  2022-07-25 17:27 ` undefinedopcode2 at gmail dot com
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-25 16:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106415

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|tree-optimization           |target

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I suspect this is a target cost model issue ...

You can look at the ivopts dump to see the costs there.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/106415] loop-ivopts prevents correct usage of dbra with 16-bit loop counters on m68k
  2022-07-22 17:32 [Bug tree-optimization/106415] New: loop-ivopts prevents correct usage of dbra with 16-bit loop counters on m68k undefinedopcode2 at gmail dot com
  2022-07-22 18:50 ` [Bug tree-optimization/106415] " undefinedopcode2 at gmail dot com
  2022-07-25 16:51 ` [Bug target/106415] " pinskia at gcc dot gnu.org
@ 2022-07-25 17:27 ` undefinedopcode2 at gmail dot com
  2022-07-26  5:01 ` undefinedopcode2 at gmail dot com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: undefinedopcode2 at gmail dot com @ 2022-07-25 17:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106415

--- Comment #3 from Undefined Opcode <undefinedopcode2 at gmail dot com> ---
Created attachment 53348
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53348&action=edit
dump of buggy optimization pass

(In reply to Andrew Pinski from comment #2)
> I suspect this is a target cost model issue ...
> 
> You can look at the ivopts dump to see the costs there.

Funny, I was just about to reply to the thread about this. It looks like it's
choosing candidate 7. I've also included the candidate 3 info for completeness.
Candidate 3:
  Incr POS: orig biv
  IV struct:
    Type:       short int
    Base:       15
    Step:       -1
    Biv:        N
    Overflowness wrto loop niter:       No-overflow
...
<trimmed>
...
Candidate 7:
  Var befor: ivtmp.19
  Var after: ivtmp.19
  Incr POS: before exit test
  IV struct:
    Type:       unsigned int
    Base:       15
    Step:       4294967295
    Biv:        N
    Overflowness wrto loop niter:       Overflow
...
<trimmed>
...
<Candidate Costs>:
  cand  cost
  0     5
  1     5
  2     5
  3     4
  4     5
  5     5
  6     5
  7     5
  8     5

It's odd to me that it's choosing 7 over 3, since 3 is clearly lower cost. It
also spits out "group candidate costs", but I'm not sure what those actually
represent (sorry, this is my first time poking through GCC optimization
internals).

I've attached a copy of test.c.174t.ivopts if someone else can make more sense
of it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/106415] loop-ivopts prevents correct usage of dbra with 16-bit loop counters on m68k
  2022-07-22 17:32 [Bug tree-optimization/106415] New: loop-ivopts prevents correct usage of dbra with 16-bit loop counters on m68k undefinedopcode2 at gmail dot com
                   ` (2 preceding siblings ...)
  2022-07-25 17:27 ` undefinedopcode2 at gmail dot com
@ 2022-07-26  5:01 ` undefinedopcode2 at gmail dot com
  2022-07-26  5:59 ` linkw at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: undefinedopcode2 at gmail dot com @ 2022-07-26  5:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106415

--- Comment #4 from Undefined Opcode <undefinedopcode2 at gmail dot com> ---
As an experiment, I added the following logic to m68k_rtx_costs():
      case PLUS:
      // <omitted a few lines related to LEA>
      if(TUNE_68000_10 && !(mode == QImode || mode == HImode))
        {
          *total = 400; // make non-byte/word ADD outrageously expensive
          return true;
        }

Sure enough, the costs for all non-short IV shot up. However, it's still
choosing candidate 7, same as before:
<Candidate Costs>:
  cand  cost
force_expr_to_var_cost size costs:
  integer 2
  symbol 4
  address 4
  other 16

force_expr_to_var_cost speed costs:
  integer 2
  symbol 4
  address 4
  other 16

  0     401
  1     5
  2     5
  3     4
  4     5
  5     5
  6     5
  7     401
  8     401

Later, when we get to group costs, the results aren't what I'd expect:
<Group-candidate Costs>:
Group 0:
  cand  cost    compl.  inv.expr.       inv.vars
  0     400     0       NIL;    NIL;
  7     0       0       NIL;    NIL;
  8     400     0       NIL;    NIL;

Group 1:
  cand  cost    compl.  inv.expr.       inv.vars
  0     0       0       NIL;    NIL;
  1     0       0       NIL;    NIL;
  2     0       0       NIL;    NIL;
  3     0       0       NIL;    NIL;
  4     0       0       NIL;    NIL;
  5     0       0       NIL;    NIL;
  6     -1      0       NIL;    NIL;
  7     0       0       NIL;    NIL;
  8     0       0       NIL;    NIL;

I'm not entirely sure what's going on here. Why does candidate 7 suddenly cost
0?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/106415] loop-ivopts prevents correct usage of dbra with 16-bit loop counters on m68k
  2022-07-22 17:32 [Bug tree-optimization/106415] New: loop-ivopts prevents correct usage of dbra with 16-bit loop counters on m68k undefinedopcode2 at gmail dot com
                   ` (3 preceding siblings ...)
  2022-07-26  5:01 ` undefinedopcode2 at gmail dot com
@ 2022-07-26  5:59 ` linkw at gcc dot gnu.org
  2022-07-27 18:28 ` undefinedopcode2 at gmail dot com
  2022-08-01 23:44 ` undefinedopcode2 at gmail dot com
  6 siblings, 0 replies; 8+ messages in thread
From: linkw at gcc dot gnu.org @ 2022-07-26  5:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106415

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |linkw at gcc dot gnu.org

--- Comment #5 from Kewen Lin <linkw at gcc dot gnu.org> ---
At the top of tree-ssa-loop-ivopts.cc file, there are some useful comments
describing the costs for iv candidate itself and group, it may help.

<Candidate Costs>:
  cand  cost
  0     5
  1     5
  2     5
  3     4
  4     5
  5     5
  6     5
  7     5
  8     5

This part is for "variable costs".

<Group-candidate Costs>:
Group 0:
  cand  cost    compl.  inv.expr.       inv.vars
  0     400     0       NIL;    NIL;
  7     0       0       NIL;    NIL;
  8     400     0       NIL;    NIL;

This part is for "group/use costs".

There would be no dumping for a candidate when it has infinite cost for the
group, so the above means only candidate 0, 7 and 8 are valid to be used for
group 0.

Final cost looks like:

    cost: 7 (complexity 0)
    reg_cost: 2
    cand_cost: 5
    cand_group_cost: 0 (complexity 0)
    candidates: 7
     group:0 --> iv_cand:7, cost=(0,0)
     group:1 --> iv_cand:7, cost=(0,0)
    invariant variables:
    invariant expressions:

----

By quick checking, it looks the inf cost for the pair (cand 3, group 0) is due
to:

  /* Check if we have enough precision to express the values of use.  */
  if (TYPE_PRECISION (utype) > TYPE_PRECISION (ctype))
    return infinite_cost;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/106415] loop-ivopts prevents correct usage of dbra with 16-bit loop counters on m68k
  2022-07-22 17:32 [Bug tree-optimization/106415] New: loop-ivopts prevents correct usage of dbra with 16-bit loop counters on m68k undefinedopcode2 at gmail dot com
                   ` (4 preceding siblings ...)
  2022-07-26  5:59 ` linkw at gcc dot gnu.org
@ 2022-07-27 18:28 ` undefinedopcode2 at gmail dot com
  2022-08-01 23:44 ` undefinedopcode2 at gmail dot com
  6 siblings, 0 replies; 8+ messages in thread
From: undefinedopcode2 at gmail dot com @ 2022-07-27 18:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106415

--- Comment #6 from Undefined Opcode <undefinedopcode2 at gmail dot com> ---
(In reply to Kewen Lin from comment #5)
> At the top of tree-ssa-loop-ivopts.cc file, there are some useful comments
> describing the costs for iv candidate itself and group, it may help.
> 
> <Candidate Costs>:
>   cand  cost
>   0     5
>   1     5
>   2     5
>   3     4
>   4     5
>   5     5
>   6     5
>   7     5
>   8     5
> 
> This part is for "variable costs".
> 
> <Group-candidate Costs>:
> Group 0:
>   cand	cost	compl.	inv.expr.	inv.vars
>   0	400	0	NIL;	NIL;
>   7	0	0	NIL;	NIL;
>   8	400	0	NIL;	NIL;
> 
> This part is for "group/use costs".
> 
> There would be no dumping for a candidate when it has infinite cost for the
> group, so the above means only candidate 0, 7 and 8 are valid to be used for
> group 0.
> 
> Final cost looks like:
> 
>     cost: 7 (complexity 0)
>     reg_cost: 2
>     cand_cost: 5
>     cand_group_cost: 0 (complexity 0)
>     candidates: 7
>      group:0 --> iv_cand:7, cost=(0,0)
>      group:1 --> iv_cand:7, cost=(0,0)
>     invariant variables:
>     invariant expressions:
> 
> ----
> 
> By quick checking, it looks the inf cost for the pair (cand 3, group 0) is
> due to:
> 
>   /* Check if we have enough precision to express the values of use.  */
>   if (TYPE_PRECISION (utype) > TYPE_PRECISION (ctype))
>     return infinite_cost;

Some debug prints confirm this is indeed what's happening, the pass thinks
candidate 3 is trying to fit a 32-bit value into 16-bits and bails. What's not
clear to me is why. Does it have some notion of -1 requiring 32-bits to store
even though it'll fit just fine in 16 for our purposes?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/106415] loop-ivopts prevents correct usage of dbra with 16-bit loop counters on m68k
  2022-07-22 17:32 [Bug tree-optimization/106415] New: loop-ivopts prevents correct usage of dbra with 16-bit loop counters on m68k undefinedopcode2 at gmail dot com
                   ` (5 preceding siblings ...)
  2022-07-27 18:28 ` undefinedopcode2 at gmail dot com
@ 2022-08-01 23:44 ` undefinedopcode2 at gmail dot com
  6 siblings, 0 replies; 8+ messages in thread
From: undefinedopcode2 at gmail dot com @ 2022-08-01 23:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106415

--- Comment #7 from Undefined Opcode <undefinedopcode2 at gmail dot com> ---
(In reply to Undefined Opcode from comment #6)
A little more context:
Group 1:
  cand  cost    compl.  inv.expr.       inv.vars
  0     0       0       NIL;    NIL;
  1     0       0       NIL;    NIL;
  2     0       0       NIL;    NIL;
  3     0       0       NIL;    NIL;
  4     0       0       NIL;    NIL;
  5     0       0       NIL;    NIL;
  6     -1      0       NIL;    NIL;
  7     0       0       NIL;    NIL;
  8     0       0       NIL;    NIL;

I'm not sure what a negative cost means, but it's also odd to me that the other
candidates all have the same cost. The candidates requiring 32-bit operations
should be strictly more expensive than those operating on 16-bit data, and
indeed they do during the initial candidate costs.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-08-01 23:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-22 17:32 [Bug tree-optimization/106415] New: loop-ivopts prevents correct usage of dbra with 16-bit loop counters on m68k undefinedopcode2 at gmail dot com
2022-07-22 18:50 ` [Bug tree-optimization/106415] " undefinedopcode2 at gmail dot com
2022-07-25 16:51 ` [Bug target/106415] " pinskia at gcc dot gnu.org
2022-07-25 17:27 ` undefinedopcode2 at gmail dot com
2022-07-26  5:01 ` undefinedopcode2 at gmail dot com
2022-07-26  5:59 ` linkw at gcc dot gnu.org
2022-07-27 18:28 ` undefinedopcode2 at gmail dot com
2022-08-01 23:44 ` undefinedopcode2 at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).