public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/102436] New: [11/12 Regression] Lost Load/Store Motion
@ 2021-09-21 20:09 law at gcc dot gnu.org
  2021-09-22  7:18 ` [Bug tree-optimization/102436] " rguenth at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: law at gcc dot gnu.org @ 2021-09-21 20:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102436

            Bug ID: 102436
           Summary: [11/12 Regression] Lost Load/Store Motion
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: law at gcc dot gnu.org
  Target Milestone: ---

Created attachment 51492
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51492&action=edit
Testcase

So consider this loop (-O2, lim2 dump, trunk, x86_64):



;;   basic block 3, loop depth 1
;;    pred:       2
;;                10
  # target_8 = PHI <target_13(D)(2), target_17(10)>
  _4 = board[target_8];
  if (_4 == 13)
    goto <bb 4>; [94.50%]
  else
    goto <bb 7>; [5.50%]
;;    succ:       4
;;                7

;;   basic block 4, loop depth 1
;;    pred:       3
  if (captures.32_5 == 0)
    goto <bb 5>; [33.00%]
  else
    goto <bb 6>; [67.00%]
;;    succ:       5
;;                6

;;   basic block 5, loop depth 1
;;    pred:       4
  numb_moves.1_21 = numb_moves;
  _22 = (long unsigned int) numb_moves.1_21;
  _23 = _22 * 24;
  _24 = (struct move_s *) _23;
  _24->from = gfrom.30_1;
  _24->target = target_8;
  _24->captured = 13;
  _24->castled = 0;
  _24->promoted = 0;
  _24->ep = 0;
  _26 = numb_moves.1_21 + 1;
  numb_moves = _26;
;;    succ:       6

;;   basic block 6, loop depth 1
;;    pred:       4
;;                5
  target_17 = target_8 + offset_14;
  _7 = board[target_17];
  if (_7 != 0)
    goto <bb 10>; [94.50%]
  else
    goto <bb 9>; [5.50%]
;;    succ:       10
;;                9

;;   basic block 10, loop depth 1
;;    pred:       6
  goto <bb 3>; [100.00%]
;;    succ:       3

In particular note the load from and store to numb_moves in block #5 within the
loop.  I don't immediately see an aliasing issue that would prevent LSM.  The
bigger problem is control flow, obviously the load/store may not be executed,
but I thought our LIM/LSM code handled that correctly.

If we look at gcc-10 we get something like this:


;;   basic block 3, loop depth 1
;;    pred:       2
;;                10
  # target_9 = PHI <target_14(D)(2), target_19(10)>
  # numb_moves_lsm.43_6 = PHI <numb_moves_lsm.43_34(2),
numb_moves_lsm.43_2(10)>
  # numb_moves_lsm_flag.44_20 = PHI <numb_moves_lsm_flag.44_35(2),
numb_moves_lsm_flag.44_18(10)>
  _4 = board[target_9];
  if (_4 == 13)
    goto <bb 4>; [94.50%]
  else
    goto <bb 7>; [5.50%]
;;    succ:       4
;;                7

;;   basic block 4, loop depth 1
;;    pred:       3
  if (captures.32_5 == 0)
    goto <bb 5>; [33.00%]
  else
    goto <bb 6>; [67.00%]
;;    succ:       5
;;                6

;;   basic block 5, loop depth 1
;;    pred:       4
  numb_moves.1_21 = numb_moves_lsm.43_6;
  _22 = (long unsigned int) numb_moves.1_21;
  _23 = _22 * 24;
  _24 = (struct move_s *) _23;
  _24->from = gfrom.30_1;
  _24->target = target_9;
  _24->captured = 13;
  _24->castled = 0;
  _24->promoted = 0;
  _24->ep = 0;
  _26 = numb_moves.1_21 + 1;
  numb_moves_lsm.43_37 = _26;
  numb_moves_lsm_flag.44_38 = 1;
;;    succ:       6

;;   basic block 6, loop depth 1
;;    pred:       4
;;                5
  # numb_moves_lsm.43_2 = PHI <numb_moves_lsm.43_6(4), numb_moves_lsm.43_37(5)>
  # numb_moves_lsm_flag.44_18 = PHI <numb_moves_lsm_flag.44_20(4),
numb_moves_lsm_flag.44_38(5)>
  target_19 = target_9 + offset_15;
  _8 = board[target_19];
  if (_8 != 0)
    goto <bb 10>; [94.50%]
  else
    goto <bb 11>; [5.50%]
;;    succ:       10
;;                11

[ ... ]
;;   basic block 10, loop depth 1
;;    pred:       6
  goto <bb 3>; [100.00%]
;;    succ:       3


Obviously with the load before the loop and the store after.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/102436] [11/12 Regression] Lost Load/Store Motion
  2021-09-21 20:09 [Bug tree-optimization/102436] New: [11/12 Regression] Lost Load/Store Motion law at gcc dot gnu.org
@ 2021-09-22  7:18 ` rguenth at gcc dot gnu.org
  2021-11-16 14:57 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-22  7:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102436

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Target Milestone|---                         |11.3
           Priority|P3                          |P2
           Keywords|                            |missed-optimization
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
   Last reconfirmed|                            |2021-09-22
             Status|UNCONFIRMED                 |ASSIGNED

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Memory reference 3: numb_moves
Memory reference 4: _24->from
...
Querying dependency of refs 3 and 4: dependent.
Querying SM WAW dependencies of ref 3 in loop 1: dependent

the issue is that we require conditional executed stores to be independent
on all other stores as we cannot re-issue other stores on exit in the proper
order.

Now, in this case the dependent stores are executed under the same condition
and in fact ordered in a way that we don't have to re-issue any dependent
store.

We're failing to handle this special case after the store-motion re-write that
fixed the TBAA issues.

Smaller testcase where we can just issue the conditional store to 'p':

unsigned p;
void foo (float *q)
{
  for (int i = 0; i < 256; ++i)
    {
      if (p)
        {
          unsigned a = p;
          *(q++) = 1.;
          p = a + 1;
        }
    }
}

the following are what's very much more difficult to handle
(we have to issue a conditional sequence of two stores, and remember the
location the non-invariant store stored to _and_ verify we can re-emit that
out-of-order, and we have to remember the value stored):

unsigned p;
void foo (float *q)
{
  for (int i = 0; i < 256; ++i)
    {
      if (p)
        {
          unsigned a = p;
          p = a + 1;
          *(q++) = 1.;
        }
    }
}

a bit easier (the store we have to re-issue is always executed after the
last conditional store):

unsigned p;
void foo (float *q)
{
  for (int i = 0; i < 256; ++i)
    {
      if (p)
        {
          unsigned a = p;
          p = a + 1;
        }
      *(q++) = 1.;
    }
}

impossible / invalid:

unsigned p;
void foo (float *q)
{
  for (int i = 0; i < 256; ++i)
    {
      *(q++) = 1.;
      if (p)
        {
          unsigned a = p;
          p = a + 1;
        }
    }
}

I will see how difficult it is to teach the already interwinded code the
"trivial" case and whether the bit easier case falls out naturally.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/102436] [11/12 Regression] Lost Load/Store Motion
  2021-09-21 20:09 [Bug tree-optimization/102436] New: [11/12 Regression] Lost Load/Store Motion law at gcc dot gnu.org
  2021-09-22  7:18 ` [Bug tree-optimization/102436] " rguenth at gcc dot gnu.org
@ 2021-11-16 14:57 ` rguenth at gcc dot gnu.org
  2021-11-19  8:35 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-11-16 14:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102436

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
So what can probably be special-cased is when all stores in the loop happen in
the same basic-block.  I do have a start for this but need to massage it a bit
more.  Once there's more than one block controlled by a different conditional
the order of stores is lost and so they have to be independent (but in
principle only across BBs, not all-to-all, but only in case the order is the
same in all BBs ...).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/102436] [11/12 Regression] Lost Load/Store Motion
  2021-09-21 20:09 [Bug tree-optimization/102436] New: [11/12 Regression] Lost Load/Store Motion law at gcc dot gnu.org
  2021-09-22  7:18 ` [Bug tree-optimization/102436] " rguenth at gcc dot gnu.org
  2021-11-16 14:57 ` rguenth at gcc dot gnu.org
@ 2021-11-19  8:35 ` cvs-commit at gcc dot gnu.org
  2021-11-19  8:37 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-19  8:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102436

--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:0fc859f5efcb4624a8b4ffdbf34d63972af179a8

commit r12-5394-g0fc859f5efcb4624a8b4ffdbf34d63972af179a8
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Nov 18 13:40:32 2021 +0100

    tree-optimization/102436 - restore loop store motion

    This restores a case of conditional store motion we fail to handle
    after the rewrite.  We can recognize the special case of all
    stores in a loop happening in a single conditionally executed
    block which ensures stores are not re-ordered by executing them
    in different loop iterations.  Separating out the case avoids
    complicating the already complex main path.

    2021-11-18  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/102436
            * tree-ssa-loop-im.c (execute_sm_if_changed): Add mode
            to just create the if structure and return the then block.
            (execute_sm): Add flag to indicate the var will re-use
            another flag var.
            (hoist_memory_references): Support a single conditional
            block with all stores as special case.

            * gcc.dg/torture/20211118-1.c: New testcase.
            * gcc.dg/tree-ssa/ssa-lim-18.c: Likewise.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/102436] [11/12 Regression] Lost Load/Store Motion
  2021-09-21 20:09 [Bug tree-optimization/102436] New: [11/12 Regression] Lost Load/Store Motion law at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-11-19  8:35 ` cvs-commit at gcc dot gnu.org
@ 2021-11-19  8:37 ` rguenth at gcc dot gnu.org
  2021-11-19 15:28 ` [Bug tree-optimization/102436] [11 " law at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-11-19  8:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102436

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |10.3.1, 12.0
      Known to fail|                            |11.1.0, 11.2.0

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed on trunk, I will likely not backport it but will keep the bug open for
now.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/102436] [11 Regression] Lost Load/Store Motion
  2021-09-21 20:09 [Bug tree-optimization/102436] New: [11/12 Regression] Lost Load/Store Motion law at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-11-19  8:37 ` rguenth at gcc dot gnu.org
@ 2021-11-19 15:28 ` law at gcc dot gnu.org
  2022-04-21  7:50 ` rguenth at gcc dot gnu.org
  2023-05-29 10:05 ` jakub at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: law at gcc dot gnu.org @ 2021-11-19 15:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102436

--- Comment #5 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Sounds reasonable (not backporting, but holding bug open for now).  I'll
probably do some testing with it internally, so if you end up wanting to
revisit the backporting question, reach out I may have useful data.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/102436] [11 Regression] Lost Load/Store Motion
  2021-09-21 20:09 [Bug tree-optimization/102436] New: [11/12 Regression] Lost Load/Store Motion law at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-11-19 15:28 ` [Bug tree-optimization/102436] [11 " law at gcc dot gnu.org
@ 2022-04-21  7:50 ` rguenth at gcc dot gnu.org
  2023-05-29 10:05 ` jakub at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-21  7:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102436

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|11.3                        |11.4

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 11.3 is being released, retargeting bugs to GCC 11.4.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/102436] [11 Regression] Lost Load/Store Motion
  2021-09-21 20:09 [Bug tree-optimization/102436] New: [11/12 Regression] Lost Load/Store Motion law at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2022-04-21  7:50 ` rguenth at gcc dot gnu.org
@ 2023-05-29 10:05 ` jakub at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-05-29 10:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102436

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|11.4                        |11.5

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 11.4 is being released, retargeting bugs to GCC 11.5.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-05-29 10:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-21 20:09 [Bug tree-optimization/102436] New: [11/12 Regression] Lost Load/Store Motion law at gcc dot gnu.org
2021-09-22  7:18 ` [Bug tree-optimization/102436] " rguenth at gcc dot gnu.org
2021-11-16 14:57 ` rguenth at gcc dot gnu.org
2021-11-19  8:35 ` cvs-commit at gcc dot gnu.org
2021-11-19  8:37 ` rguenth at gcc dot gnu.org
2021-11-19 15:28 ` [Bug tree-optimization/102436] [11 " law at gcc dot gnu.org
2022-04-21  7:50 ` rguenth at gcc dot gnu.org
2023-05-29 10:05 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).