From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk1-xa44.google.com (mail-vk1-xa44.google.com [IPv6:2607:f8b0:4864:20::a44]) by sourceware.org (Postfix) with ESMTPS id 2F29E384404A for ; Wed, 22 Jul 2020 10:39:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 2F29E384404A Received: by mail-vk1-xa44.google.com with SMTP id g22so446888vke.9 for ; Wed, 22 Jul 2020 03:39:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=gEpAfQglMwblvfZGNBUsJosoHRauobsetWSsqJheLBI=; b=SDIVMxwuxQ89Nn6Cnn78VvZ99cewgRnfIkr2OegPyQOwxzYXZ0JDzu0fJMliMptGeB i9c3MmIY1ow7Bl8KqMxomgIIj/d9VVF3ceCLyQG3kNVmdjNV86UZ+bjnmWaK3Hu5jPon hcHF6N6bvyhqKvtGubw/KSYYguEZ+TMu3z94uPfsvPpCL9HUqqQoUUjg8V7XLaOn23P2 RnBcLKRa3Yf9GdTFt/O6mgzxKgL7pU7ghA8RH7VGCe+RX3JiUxyLoAHvB0rKLGm0iMDq Qw1aPPLQpi5FghcTmQTpg4jP8qxI/6N3Yd1fXTw8NJMd/TAPZjzozFS8WuA4DFId/pON yc3w== X-Gm-Message-State: AOAM531o2tFva0zL8qO5DQE19JOhDnL4rHJix3Rp10qQOgzp43nfmG6L ik7l1CYBCIv8dGe8dqMq9dd5qgkBF4rXvZFWP6c= X-Google-Smtp-Source: ABdhPJy3WebRMgr/KFDXw36wFYnz/T0atFrLxc4kg4UWrtS8LYlkDie4yJ+4tLRCcImUzjfqmxusExxvbzENcCUv3bY= X-Received: by 2002:a1f:a282:: with SMTP id l124mr18456035vke.28.1595414384681; Wed, 22 Jul 2020 03:39:44 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Andrew Pinski Date: Wed, 22 Jul 2020 03:39:33 -0700 Message-ID: Subject: Re: [PATCH 2/2] Aarch64: Add branch diluter pass To: Andrea Corallo Cc: GCC Patches , nd , Richard Earnshaw Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jul 2020 10:39:46 -0000 On Wed, Jul 22, 2020 at 3:10 AM Andrea Corallo wro= te: > > Hi all, > > this second patch implements the AArch64 specific back-end pass > 'branch-dilution' controllable by the followings command line options: > > -mbranch-dilution > > --param=3Daarch64-branch-dilution-granularity=3D{num} > > --param=3Daarch64-branch-dilution-max-branches=3D{num} > > Some cores known to be able to benefit from this pass have been given > default tuning values for their granularity and max-branches. Each > affected core has a very specific granule size and associated max-branch > limit. This is a microarchitecture specific optimization. Typical > usage should be -mbranch-dilution with a specified -mcpu. Cores with a > granularity tuned to 0 will be ignored. Options are provided for > experimentation. Can you give a simple example of what this patch does? Also your testcases seem too sensitive to other optimizations which could happen. E.g. the call to "branch (i)" could be pulled out of the switch statement. Or even the "*i +=3D N;" could be moved to one Basic block and the switch becomes just one if statement. > Observed performance improvements on Neoverse N1 SPEC CPU 2006 where > up to ~+3% (xalancbmk) and ~+1.5% (sjeng). Average code size increase > for all the testsuite proved to be ~0.4%. Also does this improve any non-SPEC benchmarks or has it only been benchmarked with SPEC? A few comments about the patch itself: > --- a/gcc/config.gcc > +++ b/gcc/config.gcc > @@ -321,7 +321,7 @@ aarch64*-*-*) > c_target_objs=3D"aarch64-c.o" > cxx_target_objs=3D"aarch64-c.o" > d_target_objs=3D"aarch64-d.o" > - extra_objs=3D"aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o = aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o aarch64-sve-built= ins-sve2.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-colli= sion-avoidance.o aarch64-bti-insert.o" > + extra_objs=3D"aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o = aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o aarch64-sve-built= ins-sve2.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-colli= sion-avoidance.o aarch64-bti-insert.o aarch64-branch-dilution.o" > target_gtfiles=3D"\$(srcdir)/config/aarch64/aarch64-builtins.c \$(srcdir= )/config/aarch64/aarch64-sve-builtins.h \$(srcdir)/config/aarch64/aarch64-s= ve-builtins.cc" > target_has_targetm_common=3Dyes > ;; I think it is time to change how extra_objs is done and split it across a few lines; this should be done in a different patch, it will simplify future additions later on. +unsigned max_branch =3D 0; +unsigned granule_size =3D 0; These two really should be part of the insn_granule class. Having global variables is not a good idea with respect to threading of GCC. + dump_printf (MSG_NOTE, + "%d. %s%s%s > NEXT =3D (%d), PREV =3D (%d) -- UID: %d\n", + insn->index, GET_RTX_NAME (GET_CODE (insn->rtx)), + any_uncondjump_p (insn->rtx) ? " (ubranch)" : "", + insn->is_nop ? " (nop)" : "", + insn->next ? insn->next->index : -1, + insn->prev ? insn->prev->index : -1, INSN_UID (insn->rtx)); This part should really be of a method of insn_info instead. is_branch (insn->rtx) Why not: insn->is_branch () This simplifies the code and really shows what is being done. + if (is_branch (prev_real_nondebug_insn (insn->rtx)) + && is_branch (next_real_nondebug_insn (insn->rtx))) Maybe: + if (is_branch (insn->prev_real_nondebug_insn ()) + && is_branch (insn->next_real_nondebug_insn ())) + while (current_insn && !current_insn->is_branch) + { + current_insn =3D current_insn->next; + } Why not: current_insn =3D current_insn->next_branch_insn (); There are many more examples of where you can improve like the above; that is the way you define insn_info can be improved and push some of the implementation back into the insn_info definition. Thanks, Andrew > > * Algorithm and Heuristic > > The pass takes a very simple 'sliding window' approach to the problem. > We crawl through each instruction (starting at the first branch) and > keep track of the number of branches within the current "granule" (or > window). When this exceeds the max-branch value, the pass will dilute > the current granule, inserting nops to push out some of the branches. > The heuristic will favor unconditonal branches (for performance > reasons), or branches that are between two other branches (in order to > decrease the likelihood of another dilution call being needed). > > Each branch type required a different method for nop insertion due to > RTL/basic_block restrictions: > > - Returning calls do not end a basic block so can be handled by > emitting a generic nop. > > - Unconditional branches must be the end of a basic block, and nops > cannot be outside of a basic block. Thus the need for FILLER_INSN, > which allows placement outside of a basic block > > - and translates to a nop. > > - For most conditional branches we've taken a simple approach and only > handle the fallthru edge for simplicity, which we do by inserting a > "nop block" of nops on the fallthru edge, mapping that back to the > original destination block. > > - asm gotos and pcsets are going to be tricky to analyze from a > dilution perspective so are ignored at present. > > * Testing > > The two patches has been tested together on top of current master on > aarch64-unknown-linux-gnu as follow: > > - Successful compilation of 3 stage bootstrap with the > pass forced on (for stage 2, 3) > > - No additional compilation failures (SPEC CPU 2006 and SPEC CPU 2017) > > - No 'make check' regressions > > > Regards > > Andrea > > gcc/ChangeLog > > 2020-07-17 Andrea Corallo > Carey Williams > > * config.gcc (extra_objs): Add aarch64-branch-dilution.o. > * config/aarch64/aarch64-branch-dilution.c: New file. > * config/aarch64/aarch64-passes.def (branch-dilution): Register > pass. > * config/aarch64/aarch64-protos.h (struct tune_params): Declare > tuning parameters bdilution_gsize and bdilution_maxb. > (make_pass_branch_dilution): New declaration. > * config/aarch64/aarch64.c (generic_tunings, cortexa35_tunings) > (cortexa53_tunings, cortexa57_tunings, cortexa72_tunings) > (cortexa73_tunings, exynosm1_tunings, thunderxt88_tunings) > (thunderx_tunings, tsv110_tunings, xgene1_tunings) > (qdf24xx_tunings, saphira_tunings, thunderx2t99_tunings) > (neoversen1_tunings): Provide default tunings for bdilution_gsize > and bdilution_maxb. > * config/aarch64/aarch64.md (filler_insn): Define new insn. > * config/aarch64/aarch64.opt (-mbranch-dilution) > (--param=3Daarch64-branch-dilution-granularity) > (--param=3Daarch64-branch-dilution-max-branches): Add new options= . > * config/aarch64/t-aarch64 (aarch64-branch-dilution.c): New rule > for aarch64-branch-dilution.c. > * doc/invoke.texi (-mbranch-dilution) > (--param=3Daarch64-branch-dilution-granularity) > (--param=3Daarch64-branch-dilution-max-branches): Document branch > dilution options. > > gcc/testsuite/ChangeLog > > 2020-07-17 Andrea Corallo > Carey Williams > > * gcc.target/aarch64/branch-dilution-off.c: New file. > * gcc.target/aarch64/branch-dilution-on.c: New file.