public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 00/16] RFC: Embedding as and ld inside gcc driver and into libgccjit
@ 2015-06-01 20:50 David Malcolm
  2015-06-01 20:50 ` [PATCH 01/16] gcc: Generalization of timevar API; add gcc_jit_timer interface David Malcolm
                   ` (15 more replies)
  0 siblings, 16 replies; 27+ messages in thread
From: David Malcolm @ 2015-06-01 20:50 UTC (permalink / raw)
  To: gcc-patches, binutils; +Cc: David Malcolm

[Crossposting to both gcc-patches and binutils lists, since this
patch kit touches both source trees].

Binutils devs: GCC 5 gained a way to build GCC as a shared library,
libgccjit.so.

I'm been experimenting with ways of optimizing libgccjit, and the
following patch kit (touching both gcc and binutils) achieves a 5x
speedup of
  gcc/testsuite/jit.dg/test-benchmark.c
on this x86_64 box (Fedora 20).

The benchmark constructs IR for a simple function in memory, compiles
it, and runs it, 100 times in a row, in the hope of simulating the
workload of an interpreter/VM/language runtime, where bytecode
functions gradually become "hot" (e.g. interpretation count exceeds
a threshold) and are compiled to machine code, all within one
process.

gcc's backend code emits .s files, and libgccjit currently use pex to
invoke the gcc driver to turn it from .s to a .so file (which in
turn invokes "as" and "ld").

These invocations dominate the time take by libgccjit, so the patch
series attempts to time them, and to move them in-process; doing
so largely eliminates the cost of them.

Here are the performance gains:

jit.dg/test-benchmark.c, 100 iterations at optlevel 0:
 Without embedded driver:      wallclock of 5.300s (0.053s per iteration)
 With embedded driver:         wallclock of 4.630s (0.046s per iteration)
 With embedded driver & gas:   wallclock of 3.510s (0.035s per iteration)
 With embedded driver&as&ld:   wallclock of 2.130s (0.021s per iteration)
 As above, hacking up ld args: wallclock of 1.030s (0.010s per iteration)

i.e. about 5x speedup.

There are some memory leaks, FIXMEs, etc, and it hasn't been fully
tested yet, but I thought it was time to post this for discussion.

The patch kit also generalizes gcc's timevar mechanism in such a way
that it can be used both by jit client code, and by "as" and "ld".  An
example of a combined report on the accumulated timings of 100
iterations of jit.dg/test-benchmark.c at optlevel 0:

Execution times (seconds)
Client items:
 test_jit                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 create_code             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 compile                 :   0.21 (30%) usr   0.13 (45%) sys   0.25 (25%) wall   14939 kB (74%) ggc
 verify_code             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
GCC items:
 phase setup             :   0.15 (22%) usr   0.02 ( 7%) sys   0.15 (15%) wall   10661 kB (53%) ggc
 phase parsing           :   0.02 ( 3%) usr   0.00 ( 0%) sys   0.02 ( 2%) wall     653 kB ( 3%) ggc
 callgraph construction  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 2%) wall     242 kB ( 1%) ggc
 callgraph optimization  :   0.01 ( 1%) usr   0.01 ( 3%) sys   0.01 ( 1%) wall     142 kB ( 1%) ggc
 cfg construction        :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      17 kB ( 0%) ggc
 cfg cleanup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall       0 kB ( 0%) ggc
 df live regs            :   0.02 ( 3%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall      23 kB ( 0%) ggc
 register information    :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall       0 kB ( 0%) ggc
 parser (global)         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall     199 kB ( 1%) ggc
 tree eh                 :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall       0 kB ( 0%) ggc
 tree CFG construction   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall     196 kB ( 1%) ggc
 tree operand scan       :   0.00 ( 0%) usr   0.01 ( 3%) sys   0.00 ( 0%) wall     100 kB ( 0%) ggc
 out of ssa              :   0.00 ( 0%) usr   0.02 ( 7%) sys   0.01 ( 1%) wall       0 kB ( 0%) ggc
 expand                  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall     398 kB ( 2%) ggc
 loop init               :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      67 kB ( 0%) ggc
 integrated RA           :   0.07 (10%) usr   0.02 ( 7%) sys   0.02 ( 2%) wall    2468 kB (12%) ggc
 LRA virtuals elimination:   0.01 ( 1%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      56 kB ( 0%) ggc
 machine dep reorg       :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall       0 kB ( 0%) ggc
 shorten branches        :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 2%) wall       0 kB ( 0%) ggc
 final                   :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall     216 kB ( 1%) ggc
 initialize rtl          :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall      12 kB ( 0%) ggc
 rest of compilation     :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 3%) wall     232 kB ( 1%) ggc
 unaccounted todo        :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 2%) wall       0 kB ( 0%) ggc
 replay of JIT client activity:   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall     309 kB ( 2%) ggc
 driver                  :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 driver: setup           :   0.04 ( 6%) usr   0.00 ( 0%) sys   0.06 ( 6%) wall       0 kB ( 0%) ggc
 driver: do spec on infiles:   0.01 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 2%) wall       0 kB ( 0%) ggc
 driver: run linker      :   0.00 ( 0%) usr   0.01 ( 3%) sys   0.02 ( 2%) wall       0 kB ( 0%) ggc
 driver: embedded assembler:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall       0 kB ( 0%) ggc
 driver: embedded linker :   0.04 ( 6%) usr   0.02 ( 7%) sys   0.04 ( 4%) wall       0 kB ( 0%) ggc
 load JIT result         :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
Embedded 'as':
 gas_main                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 before pass             :   0.03 ( 4%) usr   0.02 ( 7%) sys   0.13 (13%) wall       0 kB ( 0%) ggc
 perform_an_assembly_pass:   0.06 ( 9%) usr   0.01 ( 3%) sys   0.06 ( 6%) wall       0 kB ( 0%) ggc
 after pass              :   0.04 ( 6%) usr   0.00 ( 0%) sys   0.03 ( 3%) wall       0 kB ( 0%) ggc
 cleanup                 :   0.02 ( 3%) usr   0.00 ( 0%) sys   0.03 ( 3%) wall       0 kB ( 0%) ggc
Embedded 'ld':
 ld_internal_main: init  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 ldmain.c: lang_final    :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 ldmain.c: lang_process  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 lang_process: 1st half  :   0.00 ( 0%) usr   0.02 ( 7%) sys   0.02 ( 2%) wall       0 kB ( 0%) ggc
 open_output             :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall       0 kB ( 0%) ggc
 open_input_bfds         :   0.01 ( 1%) usr   0.02 ( 7%) sys   0.01 ( 1%) wall       0 kB ( 0%) ggc
 lang_input_statement_enum:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 open_input_bfds:load_symbols:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 load_symbols: ldfile_open_file:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 ldlang_add_file         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 load_symbols: bfd_link_add_symbols:   0.02 ( 3%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 lang_process: 2nd half  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 4%) wall       0 kB ( 0%) ggc
 ldmain.c: ldwrite       :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 3%) wall       0 kB ( 0%) ggc
 ld_main cleanup         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 TOTAL                 :   0.69             0.29             0.99              20298 kB

Thoughts?

-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2015-06-03  6:00 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-01 20:50 [PATCH 00/16] RFC: Embedding as and ld inside gcc driver and into libgccjit David Malcolm
2015-06-01 20:50 ` [PATCH 01/16] gcc: Generalization of timevar API; add gcc_jit_timer interface David Malcolm
2015-06-01 20:50 ` [PATCH 05/16] gcc: driver: add g_driver singleton David Malcolm
2015-06-01 20:51 ` [PATCH 15/16] gcc: Use libgas and libld within the driver David Malcolm
2015-06-02  8:32   ` Richard Biener
2015-06-02 11:07     ` Trevor Saunders
2015-06-02 12:12       ` Richard Biener
2015-06-01 20:51 ` [PATCH 10/16] binutils: Introduce "gas_main", and state-purging with "gas" subdir David Malcolm
2015-06-01 20:55 ` [PATCH 09/16] libiberty.h: Provide CTIMER_PUSH/POP David Malcolm
2015-06-01 21:30   ` DJ Delorie
2015-06-02 21:52     ` Jeff Law
2015-06-01 20:55 ` [PATCH 06/16] gcc: driver: add timevars for as, collect2, ld David Malcolm
2015-06-01 20:55 ` [PATCH 08/16] libiberty.h: Provide a CLEAR_VAR macro David Malcolm
2015-06-01 21:47   ` DJ Delorie
2015-06-02  1:23     ` David Malcolm
2015-06-02  1:39       ` DJ Delorie
2015-06-01 20:55 ` [PATCH 07/16] binutils: bfd: Implement bfd_uninit David Malcolm
2015-06-01 20:56 ` [PATCH 04/16] gcc: Don't keep reinitializing RTL David Malcolm
2015-06-01 20:58 ` [PATCH 14/16] gcc: Add CTIMER_PUSH/POP to gcc's copy of libiberty David Malcolm
2015-06-01 21:33   ` DJ Delorie
2015-06-01 20:58 ` [PATCH 11/16] binutils: gas/Makefile.am: Add libgas.la David Malcolm
2015-06-01 20:58 ` [PATCH 02/16] gcc: Embed the driver in-process within libgccjit David Malcolm
2015-06-03  6:00   ` Bert Wesarg
2015-06-01 20:59 ` [PATCH 12/16] binutils: Introduce "ld_main" and state-purging within "ld" subdir David Malcolm
2015-06-01 20:59 ` [PATCH 03/16] gcc: Use timevars within driver David Malcolm
2015-06-01 21:00 ` [PATCH 16/16] gcc: Hack up the arguments to the linker David Malcolm
2015-06-01 21:00 ` [PATCH 13/16] ld/Makefile.am: Introduce a libld.la David Malcolm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).