From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 112057 invoked by alias); 27 May 2015 10:12:05 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 112039 invoked by uid 89); 27 May 2015 10:12:04 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-ob0-f178.google.com Received: from mail-ob0-f178.google.com (HELO mail-ob0-f178.google.com) (209.85.214.178) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 27 May 2015 10:12:01 +0000 Received: by obbgf1 with SMTP id gf1so3615483obb.2 for ; Wed, 27 May 2015 03:11:59 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.60.176.37 with SMTP id cf5mr25716078oec.19.1432721519751; Wed, 27 May 2015 03:11:59 -0700 (PDT) Received: by 10.76.115.167 with HTTP; Wed, 27 May 2015 03:11:59 -0700 (PDT) In-Reply-To: References: Date: Wed, 27 May 2015 10:52:00 -0000 Message-ID: Subject: Re: conditional lim From: Richard Biener To: Evgeniya Maenkova Cc: GCC Patches Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2015-05/txt/msg02406.txt.bz2 On Tue, May 26, 2015 at 3:10 PM, Evgeniya Maenkova wrote: > Hi, Richard > > Thanks for review starting. > > Do you see any major issues with this patch (i.e. algorithms and ideas > that should be completely replaced, effectively causing the re-write > of most code)? > > To decide if there are major issues in the patch, perhaps, you need > additional clarifications from me? Could you point at the places where > additional explanations could save you most effort? > > Your answers to these questions are looking the first priority ones. > You wrote about several issues in the code, which are looking as easy > (or almost easy ;) to fix(inline functions, unswitch-loops flag, > comments, etc). But, I think you agree, let=E2=80=99s first decide about = the > major issues (I mean, whether we continue with this patch or starting > new one, this will save a lot of time for both of us). I didn't get an overall idea on how the patch works, that is, how it integr= ates with the existing algorithm. If you can elaborate on that a bit that would be helpful. I think the code-generation part needs some work (whether by following my idea with re-using copy_bbs or whether by basically re-implementing it is up to debate). How does your code handle for () { if (cond1) { if (cond2) invariant; if (cond3) invariant; } } ? Optimally we'd have before the loop exactly the same if () structure (thus if (cond1) is shared). Richard. > Thanks, > > Evgeniya > > On Tue, May 26, 2015 at 2:31 PM, Richard Biener > wrote: >> On Fri, May 8, 2015 at 11:07 PM, Evgeniya Maenkova >> wrote: >>> Hi, >>> >>> Could you please review my patch for predicated lim? >>> >>> Let me note some details about it: >>> >>> >>> >>> 1) Phi statements are still moved only if they have 1 or 2 >>> arguments. However, phi statements could be move under conditions (as >>> it=E2=80=99s done for the other statements). Probably, phi statement m= otion >>> with 3 + arguments could be implemented in the next patch after >>> predicated lim. >>> >>> 2) Patch has limitations/features like (it was ok to me to >>> implement it such way, maybe I=E2=80=99m not correct. ): >>> >>> a) Loop1 >>> >>> { >>> >>> If (a) >>> >>> Loop2 >>> >>> { >>> >>> Stmt - Invariant for Loop1 >>> >>> } >>> >>> } >>> >>> In this case Stmt will be moved only out of Loop2, because of if= (a). >>> >>> b) Or >>> >>> Loop1 >>> >>> { >>> >>> =E2=80=A6 >>> >>> If (cond1) >>> >>> If (cond2) >>> >>> If (cond3) >>> >>> Stmt; >>> >>> } >>> >>> Stmt will be moved out only if cond1 is always executed in Loop1. >>> >>> c) It took me a long time to write all of these code, so there >>> might be other peculiarities which I forgot to mention. :) >>> >>> Let=E2=80=99s discuss these ones as you will rev= iew my patch. >>> >>> 3) Patch consists of 9 files: >>> >>> a) gcc/testsuite/gcc.dg/tree-ssa/loop-7.c, >>> gcc/testsuite/gcc.dg/tree-ssa/recip-3.c =E2=80=93 changed tests: >>> >>> - gcc/testsuite/gcc.dg/tree-ssa/loop-7.c changed as >>> predicated lim moves 2 more statements out of the loop; >>> >>> - gcc/testsuite/gcc.dg/tree-ssa/recip-3.c =E2=80=93 with condi= tional >>> lim recip optimization in this test doesn=E2=80=99t work (the correspon= ding >>> value is below threshold as I could see in the code for recip, 1<3). >>> So to have recip working in this test I changed test a little bit. >>> >>> b) gcc/tree-ssa-loop-im.c =E2=80=93 the patched lim per se >>> >>> c) gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c, >>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c, >>> >>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c, >>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c, >>> >>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c, >>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c >>> >>> the tests for conditional lim. >>> >>> 4) Patch testing: >>> >>> a) make =E2=80=93k check (no difference in results for me for the = clean >>> build and the patched one, >>> >>> - Revision: 222849, >>> >>> - uname -a >>> >>> Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct >>> 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux >>> >>> b) Bootstrap. >>> >>> It goes well now, however to fix it I have made a temporary hack in >>> the lim code. And with this fix patch definitely shouldn=E2=80=99t be >>> committed. >>> >>> I did so, as I would like to discuss this issue first. >>> >>> The problem is: I got stage2-stage3 comparison failure on the single >>> file (tree-vect-data-refs.o). After some investigation I understood >>> that tree-vect-data-refs.o differs being compiled with and without >>> =E2=80=98-g=E2=80=99 option (yes, more exactly on stage 2 this is =E2= =80=98-g =E2=80=93O2 =E2=80=93gtoggle=E2=80=99, >>> and for stage 3 this is =E2=80=98-g =E2=80=93O2=E2=80=99. But to simpli= fy things I can >>> reproduce this difference on the same build (even not bootstrapped), >>> with option =E2=80=98 =E2=80=93g=E2=80=99 and without it). >>> >>> Of course, the file compiled with =E2=80=93g option will contain debug >>> information and will differ from the corresponding file without debug >>> information. I mean there is the difference reported by >>> contrib/compare-debug (I mean we talk about stripped files). >>> >>> Then I compared assemblers and lim dump files and made a conclusion >>> the difference is: >>> >>> There is statement _484=3Dsomething, which is conditionally moved out of >>> loop X. In non debug cases no usages of _484 are left inside the loop >>> X. In debug case, there is the single use in debug statement. So for >>> debug case my patch generates phi statement for _484 to make it >>> available inside the loop X. For non debug case, no such phi statement >>> generated as there is no uses of _484. >>> >>> There is one more statement with lhs=3D_493, with exactly this pattern >>> of use. But this is not important for our discussion. >>> >>> >>> >>> So, in my opinion it=E2=80=99s ok to generate additional phi node for d= ebug >>> case. But I=E2=80=99m not a compiler expert and maybe there is some >>> requirement that debug and non-debug versions should differ only by >>> debug statements, I don=E2=80=99t know. >> >> As Andi said code generation needs to be the same. >> >>> My temporary hack fix is just removing of such debug statements (no >>> debug statements, no problems J). >> >> That's fine and generally what is done in other passes. >> >>> The alternatives I see are: >>> >>> - Move debug statements outside the loop (not good in my opini= on); >>> >>> - Somehow reset values in debug statements (not good in my >>> opinion, if I could provide correct information for them); >> >> You could re-set them via gimple_debug_bind_reset_value which >> will tell the user that at this point the value is optimized out >> (that's slightly >> better than removing the debug stmts). >> >>> - Generate phi for debug statements and fix bootstrap (say, >>> let=E2=80=99s have some list of files, which we have special check for.= I mean >>> for my case, the difference is not in stage2 and stage3, it=E2=80=99s i= n debug >>> and non-debug). I like this variant. However, as I don=E2=80=99t see su= ch list >>> of special files in the whole gcc, I think I might be missing >>> something. >> >> The policy ;) >> >>> What do you think? >> >> Now about the patch. Generally there seem to be spurious whitespace-only >> or formatting differences - try to avoid these. >> >> All new functiuons need a comment explaining them and their function >> parameters. >> >> +static cond_data_map *get_cond_data_map (struct loop * level) >> +{ >> >> per GCC coding conventions this should be formatted like >> >> static cond_data_map * >> get_cond_data_map (struct loop * level) >> { >> >> +#define SET_ALWAYS_EXECUTED_IN(BB, VAL)\ >> + BB->aux =3D (void *) (XCNEW (struct move_cond));\ >> + BB_MOVE_COND (BB)->type=3DALWAYS_EXECUTED_IN;\ >> + BB_MOVE_COND (BB)->loop=3DVAL; >> >> this kind of side-effects in macros are bad - better turn them into >> inline functions. >> >> - if (flag_unswitch_loops >> - && gimple_code (stmt) =3D=3D GIMPLE_COND) >> - { >> - /* If we perform unswitching, force the operands of the invariant >> - condition to be moved out of the loop. */ >> - return MOVE_POSSIBLE; >> - } >> + if (gimple_code (stmt) =3D=3D GIMPLE_COND) >> + return MOVE_POSSIBLE; >> >> any reason you removed the guard on flag_unswitch_loops? We may >> want to add a new flag, certainly this transform might not be suitable >> for -O[12s] for example. Like if not all stmts inside a conditional can >> be moved the transform will increase code-size. >> >> You are doing a (lot of?) refactoring which makes review of the patch >> harder. For example >> >> +bool calculate_tgt_level (gimple stmt, struct loop *outermost, >> + bool cond_executed, basic_block bb) >> +{ >> + struct lim_aux_data *lim_dat >> ... >> + >> + if (lim_data->cost >=3D LIM_EXPENSIVE) >> + set_profitable_level (stmt); >> + return true; >> +} >> >> ... >> >> - >> - lim_data =3D init_lim_data (stmt); >> - lim_data->always_executed_in =3D outermost; >> - >> - if (!determine_max_movement (stmt, false)) >> - { >> - lim_data->max_loop =3D NULL; >> - continue; >> - } >> - >> - if (dump_file && (dump_flags & TDF_DETAILS)) >> - { >> - print_gimple_stmt (dump_file, stmt, 2, 0); >> - fprintf (dump_file, " invariant up to level %d, cost %d.\n\= n", >> - loop_depth (lim_data->max_loop), >> - lim_data->cost); >> - } >> - >> - if (lim_data->cost >=3D LIM_EXPENSIVE) >> - set_profitable_level (stmt); >> + if (!calculate_tgt_level (stmt, outermost, cond_executed, bb)) >> + continue; >> >> where it is not obvious why moving set_profitable_level should be >> in a function named calculate_tgt_level. In fact determine_max_movement >> was supposed to be the abstraction already. >> >> Without yet getting an idea of the overall algorithm (I'm really missing= such >> a thing...) it seems to me that the current handling of doing invariant = motion >> of PHIs should handle the analysis phase just fine (you might be adding >> more advanced cases in this patch, but I'd rather do that as a followup = for >> initial support). >> >> The code-gen part looks somewhat ad-hoc to me. In principle if we have >> a list of PHIs to hoist we can materialize the needed part of the origin= al >> basic-block structure on the pre-header edge together with the controlli= ng >> conditionals. The loop pre-header would be the always-executed BB >> and the needed part would always start with the loop header and be a >> single-entry multiple-exit region. So ideally we could use sth like >> gimple_duplicate_sese_region with the BB copying worker copy_bbs >> refactored/replaced with something only copying controlling conditionals >> and statements/PHIs that a callback identifies (tree-loop-distribution.c >> would also like sth like that - currently it copies whole loops and remo= ves >> stmts it didn't intend to copy...). >> >> So basically invariantness_dom_walker would record a vector of PHIs >> to hoist (or we'd construct that at move_computations time). >> If there is any control-flow to hoist then we'd do an alternative >> move_computations by "copying" the needed CFG and covered stmts. >> In theory doing it the loop-distribution way (simply copy the whole >> loop body and then remove anything not needed) would also be possible >> though that wouldn't be the very best idea obviously ;) >> >> Thanks, >> Richard. >> >>> >>> >>> Thanks, >>> >>> >>> >>> Evgeniya > > > > -- > Thanks, > > Evgeniya > > perfstories.wordpress.com