From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-198070-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 105917 invoked by alias); 16 Jan 2019 12:44:18 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 105901 invoked by uid 89); 16 Jan 2019 12:44:17 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=1920, substantial, repair, percentage
X-HELO: mail-lj1-f181.google.com
Received: from mail-lj1-f181.google.com (HELO mail-lj1-f181.google.com) (209.85.208.181) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 16 Jan 2019 12:44:09 +0000
Received: by mail-lj1-f181.google.com with SMTP id l15-v6so5298211lja.9        for <gcc@gcc.gnu.org>; Wed, 16 Jan 2019 04:44:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=gmail.com; s=20161025;        h=mime-version:references:in-reply-to:from:date:message-id:subject:to         :cc:content-transfer-encoding;        bh=VVZ7+g+pCW9jx1CkXLgm6pSAMo7wbL18lh3qxV0nBq0=;        b=mlKdVmE+SsoP0Vg41F0c1DXGts8wpxpnYpOjESpobY/7NA6qR2m3IF+UDe4bV5KPWQ         2Cd5Dsa+7C6qkYusLhNQN/mAs2+ihyENE/eXj7qERfPs/LeloOuWZme2ZsX36bWKrZXi         ECWNVi8X1kz4ytHIzkSQL3OObVyonUkyxK5YkIOSTu1l9ZoIGANlElpWkwaKph/hysnf         UCUec+z4vXI+J4UxIuEF3pSVJBTac/KORuc2oebetLy5lgqtLzGDJCLsM1NreIqRmSwv         ysJ67c3BEC3RAXF1do6m98JYX6q/vABW2pW/sFGZD86N4nBXMnAh3AcyXg4kC+IV+XSQ         l1MQ==
MIME-Version: 1.0
References: <CAEFO=4A0DJVDYze7P5mCOzDjGpJC1Y180nP_UmXxwqduy87=bA@mail.gmail.com> <CAFiYyc1kogmJ_5suHg+7fDaNjrYZnGjNGq4dew5uvc+w6-_BKQ@mail.gmail.com> <CAEFO=4D2GU_KNG8Z-JH_4R7tFeU1Mm+u627HpWQJYmr5O+Ym7Q@mail.gmail.com> <CAFiYyc0HMDtPLJKkkYhFULBbGt8n-41fWaMtqLNmUOu9gird7w@mail.gmail.com> <CAEFO=4AAj6-+dSjivBP3DdWcYuw18iaC5wRrd7bTc_J2LT_+NQ@mail.gmail.com> <CAFiYyc2xW1=+3boU4kpZHJV06yfyyt07gXQn30Z8U-13ck5mKA@mail.gmail.com> <20190114114149.cvqvgpv32a37h5da@smtp.gmail.com> <CAFiYyc2HwsAj=V-umHos0iMN6750y_7i3sgQAScPA=4L55VG6w@mail.gmail.com> <20190115214457.dvf44pmd7ydujo5d@smtp.gmail.com>
In-Reply-To: <20190115214457.dvf44pmd7ydujo5d@smtp.gmail.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Wed, 16 Jan 2019 12:44:00 -0000
Message-ID: <CAFiYyc0r1ex8f9apC1wYu-Bfxdy9fnBM0ZTmEbwrFm3bCKERpQ@mail.gmail.com>
Subject: Re: Parallelize the compilation using Threads
To: Giuliano Belinassi <giuliano.belinassi@usp.br>
Cc: GCC Development <gcc@gcc.gnu.org>, kernel-usp@googlegroups.com, gold@ime.usp.br, 	Alfredo Goldman <alfredo.goldman@gmail.com>, Gregory.Mounie@imag.fr
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2019-01/txt/msg00126.txt.bz2

On Tue, Jan 15, 2019 at 10:45 PM Giuliano Belinassi
<giuliano.belinassi@usp.br> wrote:
>
> Hi
>
> I've managed to compile gimple-match.c with -ftime-report, and "phase opt=
 and
> generate" seems to be what takes most of the compilation time. This is ca=
ptured
> by the "TV_PHASE_OPT_GEN" timevar, and all its occurrences seem to be in
> toplev.c and lto.c.

TV_PHASE_OPT_GEN covers nearly everything besides parsing.  Thus all stuff
below "phase *" is covered by one of the phases.

It would probably be nice to split up TV_PHASE_OPT_GEN into GIMPLE,
IPA and RTL optimization phases.

> Any ideas of which part such that this variable captures is
> the most costly? Also, is that percentage in "GGC" column the amount of t=
ime
> inside the Garbage Collector?

The percentage for the GGC column is the percentage of total GGC
memory, not time.
See timevar.c:print_row

The most costly part of opt-and-generate is the various verifiers.
See the note printed
at the bottom:

> Extra diagnostic checks enabled; compiler may run slowly.
> Configure with --enable-checking=3Drelease to disable checks.

you can get a clearer picture when you configure GCC with
--enable-checking=3Drelease.
For a quick start passing -fno-checking will disable the most costly
bits already.

Richard.

>
> Time variable                                   usr           sys        =
  wall               GGC
>  phase setup                        :   0.01 (  0%)   0.01 (  0%)   0.02 =
(  0%)    1473 kB (  0%)
>  phase parsing                      :   3.74 (  4%)   1.43 ( 30%)   5.17 =
(  5%)  294287 kB ( 16%)
>  phase lang. deferred               :   0.08 (  0%)   0.03 (  1%)   0.11 =
(  0%)    7582 kB (  0%)
>  phase opt and generate             :  94.10 ( 95%)   3.26 ( 67%)  97.46 =
( 93%) 1543477 kB ( 82%)
>  phase last asm                     :   0.89 (  1%)   0.09 (  2%)   0.98 =
(  1%)   39802 kB (  2%)
>  phase finalize                     :   0.00 (  0%)   0.01 (  0%)   0.50 =
(  0%)       0 kB (  0%)
>  |name lookup                       :   0.42 (  0%)   0.12 (  2%)   0.46 =
(  0%)    6162 kB (  0%)
>  |overload resolution               :   0.37 (  0%)   0.13 (  3%)   0.42 =
(  0%)   18172 kB (  1%)
>  garbage collection                 :   2.99 (  3%)   0.03 (  1%)   3.02 =
(  3%)       0 kB (  0%)
>  dump files                         :   0.11 (  0%)   0.01 (  0%)   0.16 =
(  0%)       0 kB (  0%)
>  callgraph construction             :   0.35 (  0%)   0.01 (  0%)   0.24 =
(  0%)   61143 kB (  3%)
>  callgraph optimization             :   0.21 (  0%)   0.01 (  0%)   0.17 =
(  0%)     175 kB (  0%)
>  ipa function summary               :   0.12 (  0%)   0.00 (  0%)   0.14 =
(  0%)    2216 kB (  0%)
>  ipa dead code removal              :   0.04 (  0%)   0.01 (  0%)   0.00 =
(  0%)       0 kB (  0%)
>  ipa devirtualization               :   0.00 (  0%)   0.00 (  0%)   0.01 =
(  0%)       0 kB (  0%)
>  ipa cp                             :   0.33 (  0%)   0.01 (  0%)   0.39 =
(  0%)    9073 kB (  0%)
>  ipa inlining heuristics            :   0.48 (  0%)   0.00 (  0%)   0.48 =
(  0%)    6175 kB (  0%)
>  ipa function splitting             :   0.10 (  0%)   0.01 (  0%)   0.07 =
(  0%)    9111 kB (  0%)
>  ipa comdats                        :   0.01 (  0%)   0.00 (  0%)   0.00 =
(  0%)       0 kB (  0%)
>  ipa various optimizations          :   0.03 (  0%)   0.03 (  1%)   0.01 =
(  0%)     480 kB (  0%)
>  ipa reference                      :   0.01 (  0%)   0.00 (  0%)   0.02 =
(  0%)       0 kB (  0%)
>  ipa profile                        :   0.01 (  0%)   0.00 (  0%)   0.01 =
(  0%)       0 kB (  0%)
>  ipa pure const                     :   0.13 (  0%)   0.00 (  0%)   0.12 =
(  0%)       8 kB (  0%)
>  ipa icf                            :   0.08 (  0%)   0.00 (  0%)   0.08 =
(  0%)       6 kB (  0%)
>  ipa SRA                            :   1.26 (  1%)   0.28 (  6%)   1.78 =
(  2%)  165814 kB (  9%)
>  ipa free lang data                 :   0.01 (  0%)   0.00 (  0%)   0.00 =
(  0%)       0 kB (  0%)
>  ipa free inline summary            :   0.00 (  0%)   0.00 (  0%)   0.03 =
(  0%)       0 kB (  0%)
>  cfg construction                   :   0.09 (  0%)   0.00 (  0%)   0.09 =
(  0%)    7926 kB (  0%)
>  cfg cleanup                        :   1.84 (  2%)   0.00 (  0%)   1.73 =
(  2%)   13673 kB (  1%)
>  CFG verifier                       :   6.05 (  6%)   0.12 (  2%)   6.80 =
(  7%)       0 kB (  0%)
>  trivially dead code                :   0.32 (  0%)   0.01 (  0%)   0.38 =
(  0%)       0 kB (  0%)
>  df scan insns                      :   0.23 (  0%)   0.00 (  0%)   0.30 =
(  0%)      28 kB (  0%)
>  df multiple defs                   :   0.13 (  0%)   0.00 (  0%)   0.20 =
(  0%)       0 kB (  0%)
>  df reaching defs                   :   0.52 (  1%)   0.00 (  0%)   0.55 =
(  1%)       0 kB (  0%)
>  df live regs                       :   2.70 (  3%)   0.02 (  0%)   3.08 =
(  3%)     425 kB (  0%)
>  df live&initialized regs           :   1.28 (  1%)   0.00 (  0%)   1.13 =
(  1%)       0 kB (  0%)
>  df must-initialized regs           :   0.14 (  0%)   0.00 (  0%)   0.16 =
(  0%)       0 kB (  0%)
>  df use-def / def-use chains        :   0.32 (  0%)   0.00 (  0%)   0.26 =
(  0%)       0 kB (  0%)
>  df reg dead/unused notes           :   0.96 (  1%)   0.01 (  0%)   0.89 =
(  1%)   11726 kB (  1%)
>  register information               :   0.29 (  0%)   0.00 (  0%)   0.21 =
(  0%)       0 kB (  0%)
>  alias analysis                     :   0.54 (  1%)   0.00 (  0%)   0.53 =
(  1%)   17487 kB (  1%)
>  alias stmt walking                 :   1.10 (  1%)   0.08 (  2%)   1.22 =
(  1%)     118 kB (  0%)
>  register scan                      :   0.08 (  0%)   0.01 (  0%)   0.08 =
(  0%)     118 kB (  0%)
>  rebuild jump labels                :   0.12 (  0%)   0.01 (  0%)   0.11 =
(  0%)       0 kB (  0%)
>  preprocessing                      :   0.29 (  0%)   0.43 (  9%)   0.65 =
(  1%)   37409 kB (  2%)
>  parser (global)                    :   0.39 (  0%)   0.39 (  8%)   0.94 =
(  1%)   92661 kB (  5%)
>  parser struct body                 :   0.07 (  0%)   0.00 (  0%)   0.08 =
(  0%)    6159 kB (  0%)
>  parser enumerator list             :   0.01 (  0%)   0.00 (  0%)   0.01 =
(  0%)    3342 kB (  0%)
>  parser function body               :   2.37 (  2%)   0.43 (  9%)   2.82 =
(  3%)  119124 kB (  6%)
>  parser inl. func. body             :   0.18 (  0%)   0.05 (  1%)   0.16 =
(  0%)   10354 kB (  1%)
>  parser inl. meth. body             :   0.04 (  0%)   0.01 (  0%)   0.03 =
(  0%)    2986 kB (  0%)
>  template instantiation             :   0.17 (  0%)   0.08 (  2%)   0.26 =
(  0%)   15801 kB (  1%)
>  constant expression evaluation     :   0.06 (  0%)   0.05 (  1%)   0.07 =
(  0%)     516 kB (  0%)
>  early inlining heuristics          :   0.13 (  0%)   0.00 (  0%)   0.08 =
(  0%)   19547 kB (  1%)
>  inline parameters                  :   0.14 (  0%)   0.01 (  0%)   0.22 =
(  0%)    3372 kB (  0%)
>  integration                        :   1.00 (  1%)   0.23 (  5%)   1.22 =
(  1%)  132386 kB (  7%)
>  tree gimplify                      :   0.36 (  0%)   0.02 (  0%)   0.31 =
(  0%)   63162 kB (  3%)
>  tree eh                            :   0.03 (  0%)   0.00 (  0%)   0.04 =
(  0%)    4173 kB (  0%)
>  tree CFG construction              :   0.07 (  0%)   0.00 (  0%)   0.07 =
(  0%)   20805 kB (  1%)
>  tree CFG cleanup                   :   1.40 (  1%)   0.14 (  3%)   1.57 =
(  2%)    3995 kB (  0%)
>  tree tail merge                    :   0.17 (  0%)   0.01 (  0%)   0.16 =
(  0%)    7251 kB (  0%)
>  tree VRP                           :   1.94 (  2%)   0.08 (  2%)   1.83 =
(  2%)   40527 kB (  2%)
>  tree Early VRP                     :   0.27 (  0%)   0.03 (  1%)   0.30 =
(  0%)    3298 kB (  0%)
>  tree copy propagation              :   0.14 (  0%)   0.00 (  0%)   0.08 =
(  0%)     427 kB (  0%)
>  tree PTA                           :   0.61 (  1%)   0.03 (  1%)   0.53 =
(  1%)    3861 kB (  0%)
>  tree PHI insertion                 :   0.01 (  0%)   0.02 (  0%)   0.03 =
(  0%)    8529 kB (  0%)
>  tree SSA rewrite                   :   0.23 (  0%)   0.03 (  1%)   0.43 =
(  0%)   24334 kB (  1%)
>  tree SSA other                     :   0.10 (  0%)   0.01 (  0%)   0.10 =
(  0%)     538 kB (  0%)
>  tree SSA incremental               :   0.79 (  1%)   0.07 (  1%)   0.88 =
(  1%)   11828 kB (  1%)
>  tree operand scan                  :   1.33 (  1%)   0.30 (  6%)   1.51 =
(  1%)   56249 kB (  3%)
>  dominator optimization             :   1.92 (  2%)   0.07 (  1%)   1.90 =
(  2%)   31786 kB (  2%)
>  backwards jump threading           :   0.20 (  0%)   0.02 (  0%)   0.16 =
(  0%)    8676 kB (  0%)
>  tree SRA                           :   0.17 (  0%)   0.01 (  0%)   0.09 =
(  0%)    6050 kB (  0%)
>  isolate eroneous paths             :   0.01 (  0%)   0.00 (  0%)   0.04 =
(  0%)    1319 kB (  0%)
>  tree CCP                           :   0.67 (  1%)   0.08 (  2%)   0.62 =
(  1%)    4190 kB (  0%)
>  tree PHI const/copy prop           :   0.10 (  0%)   0.00 (  0%)   0.02 =
(  0%)     132 kB (  0%)
>  tree split crit edges              :   0.12 (  0%)   0.00 (  0%)   0.15 =
(  0%)   10236 kB (  1%)
>  tree reassociation                 :   0.14 (  0%)   0.00 (  0%)   0.08 =
(  0%)     168 kB (  0%)
>  tree PRE                           :   0.74 (  1%)   0.04 (  1%)   0.76 =
(  1%)   16728 kB (  1%)
>  tree FRE                           :   0.69 (  1%)   0.04 (  1%)   0.60 =
(  1%)    5370 kB (  0%)
>  tree code sinking                  :   0.06 (  0%)   0.01 (  0%)   0.06 =
(  0%)    9670 kB (  1%)
>  tree linearize phis                :   0.10 (  0%)   0.00 (  0%)   0.09 =
(  0%)     699 kB (  0%)
>  tree backward propagate            :   0.03 (  0%)   0.00 (  0%)   0.01 =
(  0%)       0 kB (  0%)
>  tree forward propagate             :   0.52 (  1%)   0.04 (  1%)   0.48 =
(  0%)    3055 kB (  0%)
>  tree phiprop                       :   0.05 (  0%)   0.00 (  0%)   0.01 =
(  0%)       0 kB (  0%)
>  tree conservative DCE              :   0.27 (  0%)   0.03 (  1%)   0.43 =
(  0%)    1557 kB (  0%)
>  tree aggressive DCE                :   0.21 (  0%)   0.04 (  1%)   0.23 =
(  0%)    2565 kB (  0%)
>  tree buildin call DCE              :   0.00 (  0%)   0.00 (  0%)   0.04 =
(  0%)       0 kB (  0%)
>  tree DSE                           :   0.18 (  0%)   0.01 (  0%)   0.18 =
(  0%)     274 kB (  0%)
>  PHI merge                          :   0.07 (  0%)   0.00 (  0%)   0.06 =
(  0%)    3170 kB (  0%)
>  tree loop optimization             :   0.00 (  0%)   0.00 (  0%)   0.04 =
(  0%)       0 kB (  0%)
>  loopless fn                        :   0.01 (  0%)   0.00 (  0%)   0.01 =
(  0%)       0 kB (  0%)
>  tree loop invariant motion         :   0.03 (  0%)   0.00 (  0%)   0.02 =
(  0%)       0 kB (  0%)
>  tree canonical iv                  :   0.01 (  0%)   0.00 (  0%)   0.00 =
(  0%)      58 kB (  0%)
>  complete unrolling                 :   0.00 (  0%)   0.00 (  0%)   0.01 =
(  0%)     361 kB (  0%)
>  tree iv optimization               :   0.00 (  0%)   0.00 (  0%)   0.01 =
(  0%)     128 kB (  0%)
>  tree copy headers                  :   0.02 (  0%)   0.00 (  0%)   0.01 =
(  0%)     414 kB (  0%)
>  tree SSA uncprop                   :   0.06 (  0%)   0.00 (  0%)   0.09 =
(  0%)       0 kB (  0%)
>  tree NRV optimization              :   0.01 (  0%)   0.00 (  0%)   0.05 =
(  0%)      14 kB (  0%)
>  tree SSA verifier                  :   8.44 (  9%)   0.26 (  5%)   8.77 =
(  8%)       0 kB (  0%)
>  tree STMT verifier                 :  12.57 ( 13%)   0.35 (  7%)  13.03 =
( 12%)       0 kB (  0%)
>  tree switch conversion             :   0.00 (  0%)   0.00 (  0%)   0.01 =
(  0%)       5 kB (  0%)
>  tree switch lowering               :   0.02 (  0%)   0.00 (  0%)   0.02 =
(  0%)    1194 kB (  0%)
>  gimple CSE sin/cos                 :   0.01 (  0%)   0.00 (  0%)   0.01 =
(  0%)       0 kB (  0%)
>  gimple widening/fma detection      :   0.06 (  0%)   0.00 (  0%)   0.03 =
(  0%)       2 kB (  0%)
>  tree strlen optimization           :   0.03 (  0%)   0.00 (  0%)   0.05 =
(  0%)       0 kB (  0%)
>  callgraph verifier                 :   0.93 (  1%)   0.07 (  1%)   0.99 =
(  1%)       0 kB (  0%)
>  dominance frontiers                :   0.14 (  0%)   0.00 (  0%)   0.07 =
(  0%)       0 kB (  0%)
>  dominance computation              :   1.98 (  2%)   0.05 (  1%)   2.17 =
(  2%)       0 kB (  0%)
>  control dependences                :   0.03 (  0%)   0.00 (  0%)   0.01 =
(  0%)       0 kB (  0%)
>  out of ssa                         :   0.11 (  0%)   0.00 (  0%)   0.11 =
(  0%)     253 kB (  0%)
>  expand vars                        :   0.12 (  0%)   0.00 (  0%)   0.12 =
(  0%)    5803 kB (  0%)
>  expand                             :   0.68 (  1%)   0.02 (  0%)   0.75 =
(  1%)  129150 kB (  7%)
>  post expand cleanups               :   0.09 (  0%)   0.00 (  0%)   0.03 =
(  0%)    1400 kB (  0%)
>  varconst                           :   0.01 (  0%)   0.01 (  0%)   0.01 =
(  0%)      13 kB (  0%)
>  lower subreg                       :   0.02 (  0%)   0.00 (  0%)   0.02 =
(  0%)      63 kB (  0%)
>  forward prop                       :   0.32 (  0%)   0.01 (  0%)   0.34 =
(  0%)    7384 kB (  0%)
>  CSE                                :   1.03 (  1%)   0.02 (  0%)   0.95 =
(  1%)    4656 kB (  0%)
>  dead code elimination              :   0.23 (  0%)   0.00 (  0%)   0.22 =
(  0%)       0 kB (  0%)
>  dead store elim1                   :   0.40 (  0%)   0.00 (  0%)   0.34 =
(  0%)    5665 kB (  0%)
>  dead store elim2                   :   0.60 (  1%)   0.00 (  0%)   0.65 =
(  1%)    9079 kB (  0%)
>  loop analysis                      :   0.01 (  0%)   0.00 (  0%)   0.02 =
(  0%)       0 kB (  0%)
>  loop init                          :   1.31 (  1%)   0.05 (  1%)   1.64 =
(  2%)    5802 kB (  0%)
>  loop invariant motion              :   0.02 (  0%)   0.00 (  0%)   0.02 =
(  0%)      19 kB (  0%)
>  loop fini                          :   0.02 (  0%)   0.01 (  0%)   0.04 =
(  0%)       0 kB (  0%)
>  CPROP                              :   1.27 (  1%)   0.01 (  0%)   1.14 =
(  1%)   30881 kB (  2%)
>  PRE                                :   0.61 (  1%)   0.00 (  0%)   0.59 =
(  1%)    1920 kB (  0%)
>  CSE 2                              :   0.57 (  1%)   0.01 (  0%)   0.58 =
(  1%)    2822 kB (  0%)
>  branch prediction                  :   0.08 (  0%)   0.01 (  0%)   0.10 =
(  0%)     887 kB (  0%)
>  combiner                           :   1.15 (  1%)   0.00 (  0%)   1.28 =
(  1%)   35520 kB (  2%)
>  if-conversion                      :   0.24 (  0%)   0.00 (  0%)   0.22 =
(  0%)    5851 kB (  0%)
>  integrated RA                      :   2.29 (  2%)   0.03 (  1%)   2.37 =
(  2%)   54041 kB (  3%)
>  LRA non-specific                   :   0.97 (  1%)   0.01 (  0%)   1.04 =
(  1%)    5294 kB (  0%)
>  LRA virtuals elimination           :   0.44 (  0%)   0.00 (  0%)   0.39 =
(  0%)    6089 kB (  0%)
>  LRA reload inheritance             :   0.17 (  0%)   0.00 (  0%)   0.27 =
(  0%)    5783 kB (  0%)
>  LRA create live ranges             :   1.07 (  1%)   0.00 (  0%)   1.09 =
(  1%)    1004 kB (  0%)
>  LRA hard reg assignment            :   0.11 (  0%)   0.00 (  0%)   0.09 =
(  0%)       0 kB (  0%)
>  LRA rematerialization              :   0.20 (  0%)   0.00 (  0%)   0.20 =
(  0%)       0 kB (  0%)
>  reload                             :   0.02 (  0%)   0.00 (  0%)   0.03 =
(  0%)       0 kB (  0%)
>  reload CSE regs                    :   0.90 (  1%)   0.01 (  0%)   0.80 =
(  1%)   13780 kB (  1%)
>  ree                                :   0.13 (  0%)   0.00 (  0%)   0.10 =
(  0%)     589 kB (  0%)
>  thread pro- & epilogue             :   0.51 (  1%)   0.01 (  0%)   0.57 =
(  1%)    2328 kB (  0%)
>  if-conversion 2                    :   0.08 (  0%)   0.00 (  0%)   0.08 =
(  0%)     319 kB (  0%)
>  combine stack adjustments          :   0.04 (  0%)   0.00 (  0%)   0.02 =
(  0%)       0 kB (  0%)
>  peephole 2                         :   0.12 (  0%)   0.00 (  0%)   0.18 =
(  0%)    1242 kB (  0%)
>  hard reg cprop                     :   0.57 (  1%)   0.00 (  0%)   0.49 =
(  0%)     189 kB (  0%)
>  scheduling 2                       :   2.53 (  3%)   0.03 (  1%)   2.53 =
(  2%)    5740 kB (  0%)
>  machine dep reorg                  :   0.08 (  0%)   0.00 (  0%)   0.07 =
(  0%)       0 kB (  0%)
>  reorder blocks                     :   0.74 (  1%)   0.01 (  0%)   0.69 =
(  1%)    6926 kB (  0%)
>  shorten branches                   :   0.20 (  0%)   0.00 (  0%)   0.16 =
(  0%)       0 kB (  0%)
>  final                              :   0.85 (  1%)   0.01 (  0%)   0.97 =
(  1%)  115151 kB (  6%)
>  symout                             :   1.17 (  1%)   0.11 (  2%)   1.25 =
(  1%)  202121 kB ( 11%)
>  variable tracking                  :   0.77 (  1%)   0.01 (  0%)   0.81 =
(  1%)   45792 kB (  2%)
>  var-tracking dataflow              :   1.30 (  1%)   0.01 (  0%)   1.24 =
(  1%)     926 kB (  0%)
>  var-tracking emit                  :   1.43 (  1%)   0.01 (  0%)   1.42 =
(  1%)   57281 kB (  3%)
>  tree if-combine                    :   0.06 (  0%)   0.00 (  0%)   0.02 =
(  0%)     417 kB (  0%)
>  uninit var analysis                :   0.03 (  0%)   0.00 (  0%)   0.02 =
(  0%)       0 kB (  0%)
>  straight-line strength reduction   :   0.04 (  0%)   0.00 (  0%)   0.03 =
(  0%)     525 kB (  0%)
>  store merging                      :   0.04 (  0%)   0.00 (  0%)   0.03 =
(  0%)     492 kB (  0%)
>  initialize rtl                     :   0.01 (  0%)   0.00 (  0%)   0.04 =
(  0%)      12 kB (  0%)
>  address lowering                   :   0.04 (  0%)   0.00 (  0%)   0.02 =
(  0%)       2 kB (  0%)
>  early local passes                 :   0.02 (  0%)   0.01 (  0%)   0.00 =
(  0%)       0 kB (  0%)
>  unaccounted optimizations          :   0.01 (  0%)   0.00 (  0%)   0.00 =
(  0%)       0 kB (  0%)
>  rest of compilation                :   1.29 (  1%)   0.01 (  0%)   1.11 =
(  1%)    5063 kB (  0%)
>  remove unused locals               :   0.25 (  0%)   0.04 (  1%)   0.25 =
(  0%)      37 kB (  0%)
>  address taken                      :   0.11 (  0%)   0.10 (  2%)   0.25 =
(  0%)       0 kB (  0%)
>  verify loop closed                 :   0.00 (  0%)   0.00 (  0%)   0.01 =
(  0%)       0 kB (  0%)
>  verify RTL sharing                 :   5.24 (  5%)   0.05 (  1%)   5.37 =
(  5%)       0 kB (  0%)
>  rebuild frequencies                :   0.04 (  0%)   0.00 (  0%)   0.06 =
(  0%)     621 kB (  0%)
>  repair loop structures             :   0.17 (  0%)   0.00 (  0%)   0.24 =
(  0%)       0 kB (  0%)
>  TOTAL                              :  98.82          4.83        104.24 =
       1886632 kB
> Extra diagnostic checks enabled; compiler may run slowly.
> Configure with --enable-checking=3Drelease to disable checks.
>
> real    1m54.934s
> user    1m48.938s
> sys     0m5.196s
>
>
> Thank you
> Giuliano.
>
> On 01/14, Richard Biener wrote:
> > On Mon, Jan 14, 2019 at 12:41 PM Giuliano Belinassi
> > <giuliano.belinassi@usp.br> wrote:
> > >
> > > Hi,
> > >
> > > I am currently studying the GIMPLE IR documentation and thinking abou=
t a
> > > way easily gather the timing information. I was thinking about about
> > > adding this feature to gcc to show/dump the elapsed time on GIMPLE. D=
oes
> > > this makes sense? Is this already implemented somewhere? Where is a g=
ood
> > > way to start it?
> >
> > There's -ftime-report which more-or-less tells you the time spent in the
> > individual passes.  I think there's no overall group to count GIMPLE
> > optimizers vs. RTL optimizers though.
> >
> > > Richard Biener: I would like to know What is your nickname in IRC :)
> >
> > It's richi.
> >
> > Richard.
> >
> > > Thank you,
> > > Giuliano.
> > >
> > > On 12/17, Richard Biener wrote:
> > > > On Wed, Dec 12, 2018 at 4:46 PM Giuliano Augusto Faulin Belinassi
> > > > <giuliano.belinassi@usp.br> wrote:
> > > > >
> > > > > Hi, I have some news. :-)
> > > > >
> > > > > I replicated the Martin Li=C5=A1ka experiment [1] on a 64-cores m=
achine for
> > > > > gcc [2] and Linux kernel [3] (Linux kernel was fully parallelized=
),
> > > > > and I am excited to dive into this problem. As a result, I want to
> > > > > propose GSoC project on this issue, starting with something like:
> > > > >     1- Systematically create a benchmark for easily information
> > > > > gathering. Martin Li=C5=A1ka already made the first version of it=
, but I
> > > > > need to improve it.
> > > > >     2- Find and document the global states (Try to reduce the gcc=
's
> > > > > global states as well).
> > > > >     3- Define the parallelization strategy.
> > > > >     4- First parallelization attempt.
> > > > >
> > > > > I also proposed this issue as a research project to my advisor an=
d he
> > > > > supported me on this idea. So I can work for at least one year on
> > > > > this, and other things related to it.
> > > > >
> > > > > Would anyone be willing to mentor me on this?
> > > >
> > > > As the one who initially suggested the project I'm certainly willing
> > > > to mentor you on this.
> > > >
> > > > Richard.
> > > >
> > > > > [1] https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D43440
> > > > > [2] https://www.ime.usp.br/~belinass/64cores-experiment.svg
> > > > > [3] https://www.ime.usp.br/~belinass/64cores-kernel-experiment.svg
> > > > > On Mon, Nov 19, 2018 at 8:53 AM Richard Biener
> > > > > <richard.guenther@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Nov 16, 2018 at 8:00 PM Giuliano Augusto Faulin Belinas=
si
> > > > > > <giuliano.belinassi@usp.br> wrote:
> > > > > > >
> > > > > > > Hi! Sorry for the late reply again :P
> > > > > > >
> > > > > > > On Thu, Nov 15, 2018 at 8:29 AM Richard Biener
> > > > > > > <richard.guenther@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Nov 14, 2018 at 10:47 PM Giuliano Augusto Faulin Be=
linassi
> > > > > > > > <giuliano.belinassi@usp.br> wrote:
> > > > > > > > >
> > > > > > > > > As a brief introduction, I am a graduate student that got=
 interested
> > > > > > > > >
> > > > > > > > > in the "Parallelize the compilation using threads"(GSoC 2=
018 [1]). I
> > > > > > > > > am a newcommer in GCC, but already have sent some patches=
, some of
> > > > > > > > > them have already been accepted [2].
> > > > > > > > >
> > > > > > > > > I brought this subject up in IRC, but maybe here is a pro=
per place to
> > > > > > > > > discuss this topic.
> > > > > > > > >
> > > > > > > > > From my point of view, parallelizing GCC itself will only=
 speed up the
> > > > > > > > > compilation of projects which have a big file that create=
s a
> > > > > > > > > bottleneck in the whole project compilation (note: by big=
, I mean the
> > > > > > > > > amount of code to generate).
> > > > > > > >
> > > > > > > > That's true.  During GCC bootstrap there are some of those =
(see PR84402).
> > > > > > > >
> > > > > > >
> > > > > > > > One way to improve parallelism is to use link-time optimiza=
tion where
> > > > > > > > even single source files can be split up into multiple link=
-time units.  But
> > > > > > > > then there's the serial whole-program analysis part.
> > > > > > >
> > > > > > > Did you mean this: https://gcc.gnu.org/bugzilla/show_bug.cgi?=
id=3D84402 ?
> > > > > > > That is a lot of data :-)
> > > > > > >
> > > > > > > It seems that 'phase opt and generate' is the most time-consu=
ming
> > > > > > > part. Is that the 'GIMPLE optimization pipeline' you were tal=
king
> > > > > > > about in this thread:
> > > > > > > https://gcc.gnu.org/ml/gcc/2018-03/msg00202.html
> > > > > >
> > > > > > It's everything that comes after the frontend parsing bits, thu=
s this
> > > > > > includes in particular RTL optimization and early GIMPLE optimi=
zations.
> > > > > >
> > > > > > > > > Additionally, I know that GCC must not
> > > > > > > > > change the project layout, but from the software engineer=
ing perspective,
> > > > > > > > > this may be a bad smell that indicates that the file shou=
ld be broken
> > > > > > > > > into smaller files. Finally, the Makefiles will take care=
 of the
> > > > > > > > > parallelization task.
> > > > > > > >
> > > > > > > > What do you mean by GCC must not change the project layout?=
  GCC
> > > > > > > > happily re-orders functions and link-time optimization will=
 reorder
> > > > > > > > TUs (well, linking may as well).
> > > > > > > >
> > > > > > >
> > > > > > > That was a response to a comment made on IRC:
> > > > > > >
> > > > > > > On Thu, Nov 15, 2018 at 9:44 AM Jonathan Wakely <jwakely.gcc@=
gmail.com> wrote:
> > > > > > > >I think this is in response to a comment I made on IRC. Giul=
iano said
> > > > > > > >that if a project has a very large file that dominates the t=
otal build
> > > > > > > >time, the file should be split up into smaller pieces. I sai=
d  "GCC
> > > > > > > >can't restructure people's code. it can only try to compile =
it
> > > > > > > >faster". We weren't referring to code transformations in the=
 compiler
> > > > > > > >like re-ordering functions, but physically refactoring the s=
ource
> > > > > > > >code.
> > > > > > >
> > > > > > > Yes. But from one of the attachments from PR84402, it seems t=
hat such
> > > > > > > files exist on GCC,
> > > > > > > https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D43440
> > > > > > >
> > > > > > > > > My questions are:
> > > > > > > > >
> > > > > > > > >  1. Is there any project compilation that will significan=
tly be improved
> > > > > > > > > if GCC runs in parallel? Do someone has data about someth=
ing related
> > > > > > > > > to that? How about the Linux Kernel? If not, I can try to=
 bring some.
> > > > > > > >
> > > > > > > > We do not have any data about this apart from experiments w=
ith
> > > > > > > > splitting up source files for PR84402.
> > > > > > > >
> > > > > > > > >  2. Did I correctly understand the goal of the paralleliz=
ation? Can
> > > > > > > > > anyone provide extra details to me?
> > > > > > > >
> > > > > > > > You may want to search the mailing list archives since we h=
ad a
> > > > > > > > student application (later revoked) for the task with some =
discussion.
> > > > > > > >
> > > > > > > > In my view (I proposed the thing) the most interesting part=
s are
> > > > > > > > getting GCCs global state documented and reduced.  The para=
llelization
> > > > > > > > itself is an interesting experiment but whether there will =
be any
> > > > > > > > substantial improvement for builds that can already benefit=
 from make
> > > > > > > > parallelism remains a question.
> > > > > > >
> > > > > > > As I agree that documenting GCC's global states is good for t=
he
> > > > > > > community and the development of GCC, I really don't think th=
is a good
> > > > > > > motivation for parallelizing a compiler from a research stand=
point.
> > > > > >
> > > > > > True ;)  Note that my suggestions to the other GSoC student were
> > > > > > purely based on where it's easiest to experiment with paralelli=
zation
> > > > > > and not where it would be most beneficial.
> > > > > >
> > > > > > > There must be something or someone that could take advantage =
of the
> > > > > > > fine-grained parallelism. But that data from PR84402 seems to=
 have the
> > > > > > > answer to it. :-)
> > > > > > >
> > > > > > > On Thu, Nov 15, 2018 at 4:07 PM Szabolcs Nagy <Szabolcs.Nagy@=
arm.com> wrote:
> > > > > > > >
> > > > > > > > On 15/11/18 10:29, Richard Biener wrote:
> > > > > > > > > In my view (I proposed the thing) the most interesting pa=
rts are
> > > > > > > > > getting GCCs global state documented and reduced.  The pa=
rallelization
> > > > > > > > > itself is an interesting experiment but whether there wil=
l be any
> > > > > > > > > substantial improvement for builds that can already benef=
it from make
> > > > > > > > > parallelism remains a question.
> > > > > > > >
> > > > > > > > in the common case (project with many small files, much mor=
e than
> > > > > > > > core count) i'd expect a regression:
> > > > > > > >
> > > > > > > > if gcc itself tries to parallelize that introduces inter th=
read
> > > > > > > > synchronization and potential false sharing in gcc (e.g. ma=
lloc
> > > > > > > > locks) that does not exist with make parallelism (glibc can=
 avoid
> > > > > > > > some atomic instructions when a process is single threaded).
> > > > > > >
> > > > > > > That is what I am mostly worried about. Or the most costly pa=
rt is not
> > > > > > > parallelizable at all. Also, I would expect a regression on v=
ery small
> > > > > > > files, which probably could be avoided implementing this feat=
ure as a
> > > > > > > flag?
> > > > > >
> > > > > > I think the the issue should be avoided by avoiding fine-graine=
d paralellism.
> > > > > > Which might be somewhat hard given there are core data structur=
es that
> > > > > > are shared (the memory allocator for a start).
> > > > > >
> > > > > > The other issue I am more worried about is that we probably hav=
e to
> > > > > > interact with make somehow so that we do not end up with 64 thr=
eads
> > > > > > when one does -j8 on a 8 core machine.  That's basically the sa=
me
> > > > > > issue we run into with -flto and it's threaded WPA writeout or =
recursive
> > > > > > invocation of make.
> > > > > >
> > > > > > >
> > > > > > > On Fri, Nov 16, 2018 at 11:05 AM Martin Jambor <mjambor@suse.=
cz> wrote:
> > > > > > > >
> > > > > > > > Hi Giuliano,
> > > > > > > >
> > > > > > > > On Thu, Nov 15 2018, Richard Biener wrote:
> > > > > > > > > You may want to search the mailing list archives since we=
 had a
> > > > > > > > > student application (later revoked) for the task with som=
e discussion.
> > > > > > > >
> > > > > > > > Specifically, the whole thread beginning with
> > > > > > > > https://gcc.gnu.org/ml/gcc/2018-03/msg00179.html
> > > > > > > >
> > > > > > > > Martin
> > > > > > > >
> > > > > > >
> > > > > > > Yes, I will research this carefully ;-)
> > > > > > >
> > > > > > > Thank you