public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [00/23] Make fwprop use an on-the-side RTL SSA representation
@ 2020-11-13  8:10 Richard Sandiford
  2020-11-13  8:11 ` [01/23] vec: Silence clang warning Richard Sandiford
                   ` (23 more replies)
  0 siblings, 24 replies; 88+ messages in thread
From: Richard Sandiford @ 2020-11-13  8:10 UTC (permalink / raw)
  To: gcc-patches

Just after GCC 10 stage 1 closed (oops), I posted a patch to add a new
combine pass.  One of its main aims was to allow instructions to move
around where necessary in order to make a combination possible.
It also tried to parallelise instructions that use the same resource.

That pass contained its own code for maintaining limited def-use chains.
When I posted the patch, Segher asked why we wanted yet another piece
of pass-specific code to do that.  Although I had specific reasons
(which I explained at the time) I've gradually come round to agreeing
that that was a flaw.

This series of patches is the result of a Covid-time project to add
a more general, pass-agnostic framework.  There are two parts:
adding the framework itself, and using it to make fwprop.c faster.

The framework part
------------------

The framework provides an optional, on-the-side SSA view of existing
RTL instructions.  Each instruction gets a list of definitions and a
list of uses, with each use having a single definition.  Phi nodes
handle cases in which there are multiple possible definitions of a
register on entry to a basic block.  There are also routines for
updating instructions while keeping the SSA representation intact.

The aim is only to provide a different view of existing RTL instructions.
Unlike gimple, and unlike (IIRC) the old RTL SSA project from way back,
the new framework isn't a “native” SSA representation.  This means that
all inputs to a phi node for a register R are also definitions of
register R; no move operation is “hidden” in the phi node.

Like gimple, the framework treats memory as a single unified resource.

A more in-depth summary is contained in the doc patch, but some
other random notes:

* At the moment, the SSA information is local to one pass, but it might
  be good to maintain it between passes in future.

* The SSA code groups blocks into extended basic blocks, with the
  EBBs rather than individual blocks having phi nodes.  

* The framework also provides live range information for registers
  within an extended basic block and allows instructions to move within
  their EBB.  It might be useful to allow further movement in future;
  I just don't have a use case for it yet.

* One advantage of the new infrastructure is that it gives
  recog_for_combine-like behaviour: if recog wants to add clobbers
  of things like the flags register, the SSA code will make sure
  that the flags register is free.

* All current queries and updates have amortised sublinear complexity.
  Some updates are done lazily in order to avoid an upfront linear cost.

* I've tried to optimise the code for both memory footprint and
  compile time.  The first part involves quite a bit of overloading
  of pointers and various other kinds of reuse, so most of the new data
  structures use private member variables and public accessor functions.
  I know that style isn't universally popular, but I think it's
  justified here.  Things could easily go wrong if passes tried
  to operate directly on the underlying data structures.

* Debug instructions get SSA information too, on a best-effort basis.
  Providing complete information would be significantly more expensive.

* I wasn't sure for new C++ code whether to stick to the old C /* … */
  comments, or whether to switch to //.  In the end I went for //,
  on the basis that:

  - The ranger code already does this.

  - // is certainly more idiomatic in C++.

  - // is in the lisp tradition of per-line comments and it matches the
    ;; used in .md files.  I feel sure that GCC would have been written
    using // from the outset if that had been possible.

  The patches only do this for new files.  The aim is to ensure that
  each file is at least self-consistent.

Using RTL SSA to make fwprop faster
-----------------------------------

In order to show the thing in action, I tried to port fwprop.c
to use RTL SSA while preserving the pass's current heuristics as
much as possible.

To get an extreme measurement of speed, I made each fwprop pass
run 5000 times, calling:

  df_finish_pass (false);

after each iteration.  Usually only the first iteration would actually
do any optimisation, the other iterations would simply test the cost of
the instruction processing.  In the case of the “old” pass, this included:

  - df_analyze (including solving the notes and md problems)
  - the dominator walk to build a list of single definitions

In the case of the “new” pass, this included:

  - df_analyze (with no additional problems)
  - building the SSA representation

On an --enable-checking=release compiler, the post-patch version was 23%
faster than the pre-patch version when compiling simplify-rtx.ii at -O.

When compiling simplify-rtx.ii at -O normally (without the hack above),
the compile-time improvement is ~0.5% (which was outside the noise).
The assembly output was unchanged.

Testing
-------

Tested so far on aarch64-linux-gnu, arm-linux-gnueabihf and
x86_64-linux-gnu.  I'll test on powerpc64le-linux-gnu too.

I also tried comparing code with the old and new fwprop.c implementations.
When testing the “old” fwprop.c implementation I applied the attached
patch to avoid a couple of quirks that would otherwise skew the results:

(1) The code that handled LO_SUM propagations had:

	  /* OP1 is likely not a legitimate address, otherwise there would have
	     been no LO_SUM.  We want it to disappear if it is invalid, return
	     false in that case.  */
	  return memory_address_p (mode, tem);

    But this early exit occurs before any replacement has been made,
    so the pass never substituted into LO_SUMs, even on a true return.

(2) use_killed_between didn't take advantage of the fact that
    frame_pointer_rtx and arg_pointer_rtx are function invariants.
    Sometimes it could prove this indirectly, but not always.

With those tweaks, the “old” and “new” passes produced the same assembly
output for some spot-checked GCC files, such as simplify-rtx.ii, optabs.ii,
etc., compiled with -O2 -g.

I also tried building glibc for aarch64-linux-gnu with both versions.
There were some cases in which INDEX+BASE addresses were canonicalised
as BASE+INDEX, but there were no other differences.

I also tried compiling gcc.dg, g++.dg and gcc.c-torture at -O2
-ftree-vectorize on at least one target per CPU directory.  The same
kind of address canonicalisation differences showed up here too,
but for most targets there were only a handful (<=10) cases in
which the new file had more or fewer lines.  Most of the differences
were improvements.

As a side-effect of the above, I tried building with gcc 10.2.1, gcc 7.4.0
and gcc 5.4.0.  I also tried gcc 4.8.5 and clang 10 for compatibility
purposes.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2022-08-16  7:59 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-13  8:10 [00/23] Make fwprop use an on-the-side RTL SSA representation Richard Sandiford
2020-11-13  8:11 ` [01/23] vec: Silence clang warning Richard Sandiford
2020-11-25 19:58   ` Jeff Law
2020-11-13  8:12 ` [02/23] rtlanal: Remove noop_move_p REG_EQUAL condition Richard Sandiford
2020-11-25 20:00   ` Jeff Law
2020-11-13  8:12 ` [03/23] reginfo: Add a global_reg_set Richard Sandiford
2020-11-25 20:01   ` Jeff Law
2020-11-13  8:13 ` [04/23] Move iterator_range to a new iterator-utils.h file Richard Sandiford
2020-11-25 20:02   ` Jeff Law
2020-11-13  8:13 ` [05/23] Add more iterator utilities Richard Sandiford
2020-11-25 20:12   ` Jeff Law
2020-11-13  8:14 ` [06/23] Add an RAII class for managing obstacks Richard Sandiford
2020-11-25 20:15   ` Jeff Law
2020-11-13  8:14 ` [07/23] Add a class that multiplexes two pointer types Richard Sandiford
2020-11-25 20:23   ` Jeff Law
2020-11-26 16:15     ` Richard Sandiford
2020-11-30  1:28       ` Jeff Law
2020-11-25 23:33   ` Martin Sebor
2020-11-26 17:06     ` Richard Sandiford
2020-11-27 18:12       ` Richard Sandiford
2020-11-28  0:17       ` Martin Sebor
2020-12-17  0:17         ` Richard Sandiford
2020-12-17 14:21           ` Tom Tromey
2020-12-17 15:38             ` Richard Sandiford
2020-12-17 15:44               ` Nathan Sidwell
2021-01-04 15:32                 ` Jeff Law
2020-11-13  8:15 ` [08/23] Add an alternative splay tree implementation Richard Sandiford
2020-12-02 20:36   ` Jeff Law
2020-12-17  0:29     ` Richard Sandiford
2021-01-04 15:27       ` Jeff Law
2021-01-01  8:25   ` Andreas Schwab
2021-01-04 14:53     ` Richard Sandiford
2021-01-04 15:02       ` Andreas Schwab
2021-01-04 15:42         ` Richard Sandiford
2021-01-05 12:13           ` Richard Biener
2020-11-13  8:15 ` [09/23] Add a cut-down version of std::span (array_slice) Richard Sandiford
2020-11-30 19:56   ` Jeff Law
2022-08-03 15:13   ` Martin Jambor
2022-08-03 15:31     ` Richard Sandiford
2022-08-10 16:03   ` Martin Jambor
2022-08-11  6:58     ` Richard Biener
2022-08-16  7:59       ` Richard Sandiford
2020-11-13  8:16 ` [10/23] Tweak the way that is_a is implemented Richard Sandiford
2020-12-02  5:15   ` Jeff Law
2020-11-13  8:16 ` [11/23] Split update_cfg_for_uncondjump out of combine Richard Sandiford
2020-11-30  6:14   ` Jeff Law
2020-11-13  8:17 ` [12/23] Export print-rtl.c:print_insn_with_notes Richard Sandiford
2020-11-25 20:24   ` Jeff Law
2020-11-13  8:18 ` [13/23] recog: Split out a register_asm_p function Richard Sandiford
2020-11-25 20:24   ` Jeff Law
2020-11-13  8:18 ` [14/23] simplify-rtx: Put simplify routines into a class Richard Sandiford
2020-11-30 19:54   ` Jeff Law
2020-11-13  8:19 ` [15/23] recog: Add a validate_change_xveclen function Richard Sandiford
2020-11-30 20:03   ` Jeff Law
2020-11-13  8:19 ` [16/23] recog: Add a way of temporarily undoing changes Richard Sandiford
2020-11-25 20:27   ` Jeff Law
2020-12-17  0:22     ` Richard Sandiford
2020-11-13  8:20 ` [17/23] recog: Add a class for propagating into insns Richard Sandiford
2020-12-03 22:32   ` Jeff Law
2020-11-13  8:20 ` [18/23] recog: Add an RAII class for undoing insn changes Richard Sandiford
2020-11-25 20:27   ` Jeff Law
2020-11-13  8:20 ` [19/23] rtlanal: Add some new helper classes Richard Sandiford
2020-12-13 17:30   ` Jeff Law
2020-12-14 16:37     ` Richard Sandiford
2020-12-14 20:02       ` Jeff Law
2020-11-13  8:21 ` [20/23] rtlanal: Add simple_regno_set Richard Sandiford
2020-11-25 20:31   ` Jeff Law
2020-12-17  0:47     ` Richard Sandiford
2021-01-04 15:28       ` Jeff Law
2020-11-13  8:22 ` [21/23] doc: Add documentation for rtl-ssa Richard Sandiford
2020-11-30  6:26   ` Jeff Law
2020-11-13  8:23 ` [PATCH 22/23] Add rtl-ssa Richard Sandiford
2020-12-16  3:31   ` Jeff Law
2020-12-17  0:33     ` Richard Sandiford
2020-12-19 20:01       ` Jeff Law
2020-11-13  8:24 ` [PATCH 23/23] fwprop: Rewrite to use RTL SSA Richard Sandiford
2020-12-16  3:52   ` Jeff Law
2020-12-17  0:34     ` Richard Sandiford
2020-11-25 19:58 ` [00/23] Make fwprop use an on-the-side RTL SSA representation Jeff Law
2020-11-26 16:03   ` Richard Sandiford
2020-11-27 15:56     ` Michael Matz
2020-11-27 16:31       ` Richard Sandiford
2020-11-30 21:13         ` Jeff Law
2020-12-01  0:03           ` Michael Matz
2020-12-01 10:15             ` Richard Sandiford
2020-12-02  0:25             ` Jeff Law
2020-11-30  6:45     ` Jeff Law
2020-11-30 14:12       ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).