From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 108248 invoked by alias); 26 May 2015 13:11:04 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 108156 invoked by uid 89); 26 May 2015 13:11:03 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=AWL,BAYES_50,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-la0-f51.google.com Received: from mail-la0-f51.google.com (HELO mail-la0-f51.google.com) (209.85.215.51) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 26 May 2015 13:10:58 +0000 Received: by laat2 with SMTP id t2so67225469laa.1 for ; Tue, 26 May 2015 06:10:54 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.112.48.68 with SMTP id j4mr22386854lbn.60.1432645854569; Tue, 26 May 2015 06:10:54 -0700 (PDT) Received: by 10.112.1.100 with HTTP; Tue, 26 May 2015 06:10:54 -0700 (PDT) In-Reply-To: References: Date: Tue, 26 May 2015 13:38:00 -0000 Message-ID: Subject: Re: conditional lim From: Evgeniya Maenkova To: Richard Biener Cc: GCC Patches Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2015-05/txt/msg02330.txt.bz2 Hi, Richard Thanks for review starting. Do you see any major issues with this patch (i.e. algorithms and ideas that should be completely replaced, effectively causing the re-write of most code)? To decide if there are major issues in the patch, perhaps, you need additional clarifications from me? Could you point at the places where additional explanations could save you most effort? Your answers to these questions are looking the first priority ones. You wrote about several issues in the code, which are looking as easy (or almost easy ;) to fix(inline functions, unswitch-loops flag, comments, etc). But, I think you agree, let=E2=80=99s first decide about the major issues (I mean, whether we continue with this patch or starting new one, this will save a lot of time for both of us). Thanks, Evgeniya On Tue, May 26, 2015 at 2:31 PM, Richard Biener wrote: > On Fri, May 8, 2015 at 11:07 PM, Evgeniya Maenkova > wrote: >> Hi, >> >> Could you please review my patch for predicated lim? >> >> Let me note some details about it: >> >> >> >> 1) Phi statements are still moved only if they have 1 or 2 >> arguments. However, phi statements could be move under conditions (as >> it=E2=80=99s done for the other statements). Probably, phi statement mo= tion >> with 3 + arguments could be implemented in the next patch after >> predicated lim. >> >> 2) Patch has limitations/features like (it was ok to me to >> implement it such way, maybe I=E2=80=99m not correct. ): >> >> a) Loop1 >> >> { >> >> If (a) >> >> Loop2 >> >> { >> >> Stmt - Invariant for Loop1 >> >> } >> >> } >> >> In this case Stmt will be moved only out of Loop2, because of if = (a). >> >> b) Or >> >> Loop1 >> >> { >> >> =E2=80=A6 >> >> If (cond1) >> >> If (cond2) >> >> If (cond3) >> >> Stmt; >> >> } >> >> Stmt will be moved out only if cond1 is always executed in Loop1. >> >> c) It took me a long time to write all of these code, so there >> might be other peculiarities which I forgot to mention. :) >> >> Let=E2=80=99s discuss these ones as you will revi= ew my patch. >> >> 3) Patch consists of 9 files: >> >> a) gcc/testsuite/gcc.dg/tree-ssa/loop-7.c, >> gcc/testsuite/gcc.dg/tree-ssa/recip-3.c =E2=80=93 changed tests: >> >> - gcc/testsuite/gcc.dg/tree-ssa/loop-7.c changed as >> predicated lim moves 2 more statements out of the loop; >> >> - gcc/testsuite/gcc.dg/tree-ssa/recip-3.c =E2=80=93 with condit= ional >> lim recip optimization in this test doesn=E2=80=99t work (the correspond= ing >> value is below threshold as I could see in the code for recip, 1<3). >> So to have recip working in this test I changed test a little bit. >> >> b) gcc/tree-ssa-loop-im.c =E2=80=93 the patched lim per se >> >> c) gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c, >> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c, >> >> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c, >> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c, >> >> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c, >> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c >> >> the tests for conditional lim. >> >> 4) Patch testing: >> >> a) make =E2=80=93k check (no difference in results for me for the c= lean >> build and the patched one, >> >> - Revision: 222849, >> >> - uname -a >> >> Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct >> 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux >> >> b) Bootstrap. >> >> It goes well now, however to fix it I have made a temporary hack in >> the lim code. And with this fix patch definitely shouldn=E2=80=99t be >> committed. >> >> I did so, as I would like to discuss this issue first. >> >> The problem is: I got stage2-stage3 comparison failure on the single >> file (tree-vect-data-refs.o). After some investigation I understood >> that tree-vect-data-refs.o differs being compiled with and without >> =E2=80=98-g=E2=80=99 option (yes, more exactly on stage 2 this is =E2= =80=98-g =E2=80=93O2 =E2=80=93gtoggle=E2=80=99, >> and for stage 3 this is =E2=80=98-g =E2=80=93O2=E2=80=99. But to simplif= y things I can >> reproduce this difference on the same build (even not bootstrapped), >> with option =E2=80=98 =E2=80=93g=E2=80=99 and without it). >> >> Of course, the file compiled with =E2=80=93g option will contain debug >> information and will differ from the corresponding file without debug >> information. I mean there is the difference reported by >> contrib/compare-debug (I mean we talk about stripped files). >> >> Then I compared assemblers and lim dump files and made a conclusion >> the difference is: >> >> There is statement _484=3Dsomething, which is conditionally moved out of >> loop X. In non debug cases no usages of _484 are left inside the loop >> X. In debug case, there is the single use in debug statement. So for >> debug case my patch generates phi statement for _484 to make it >> available inside the loop X. For non debug case, no such phi statement >> generated as there is no uses of _484. >> >> There is one more statement with lhs=3D_493, with exactly this pattern >> of use. But this is not important for our discussion. >> >> >> >> So, in my opinion it=E2=80=99s ok to generate additional phi node for de= bug >> case. But I=E2=80=99m not a compiler expert and maybe there is some >> requirement that debug and non-debug versions should differ only by >> debug statements, I don=E2=80=99t know. > > As Andi said code generation needs to be the same. > >> My temporary hack fix is just removing of such debug statements (no >> debug statements, no problems J). > > That's fine and generally what is done in other passes. > >> The alternatives I see are: >> >> - Move debug statements outside the loop (not good in my opinio= n); >> >> - Somehow reset values in debug statements (not good in my >> opinion, if I could provide correct information for them); > > You could re-set them via gimple_debug_bind_reset_value which > will tell the user that at this point the value is optimized out > (that's slightly > better than removing the debug stmts). > >> - Generate phi for debug statements and fix bootstrap (say, >> let=E2=80=99s have some list of files, which we have special check for. = I mean >> for my case, the difference is not in stage2 and stage3, it=E2=80=99s in= debug >> and non-debug). I like this variant. However, as I don=E2=80=99t see suc= h list >> of special files in the whole gcc, I think I might be missing >> something. > > The policy ;) > >> What do you think? > > Now about the patch. Generally there seem to be spurious whitespace-only > or formatting differences - try to avoid these. > > All new functiuons need a comment explaining them and their function > parameters. > > +static cond_data_map *get_cond_data_map (struct loop * level) > +{ > > per GCC coding conventions this should be formatted like > > static cond_data_map * > get_cond_data_map (struct loop * level) > { > > +#define SET_ALWAYS_EXECUTED_IN(BB, VAL)\ > + BB->aux =3D (void *) (XCNEW (struct move_cond));\ > + BB_MOVE_COND (BB)->type=3DALWAYS_EXECUTED_IN;\ > + BB_MOVE_COND (BB)->loop=3DVAL; > > this kind of side-effects in macros are bad - better turn them into > inline functions. > > - if (flag_unswitch_loops > - && gimple_code (stmt) =3D=3D GIMPLE_COND) > - { > - /* If we perform unswitching, force the operands of the invariant > - condition to be moved out of the loop. */ > - return MOVE_POSSIBLE; > - } > + if (gimple_code (stmt) =3D=3D GIMPLE_COND) > + return MOVE_POSSIBLE; > > any reason you removed the guard on flag_unswitch_loops? We may > want to add a new flag, certainly this transform might not be suitable > for -O[12s] for example. Like if not all stmts inside a conditional can > be moved the transform will increase code-size. > > You are doing a (lot of?) refactoring which makes review of the patch > harder. For example > > +bool calculate_tgt_level (gimple stmt, struct loop *outermost, > + bool cond_executed, basic_block bb) > +{ > + struct lim_aux_data *lim_dat > ... > + > + if (lim_data->cost >=3D LIM_EXPENSIVE) > + set_profitable_level (stmt); > + return true; > +} > > ... > > - > - lim_data =3D init_lim_data (stmt); > - lim_data->always_executed_in =3D outermost; > - > - if (!determine_max_movement (stmt, false)) > - { > - lim_data->max_loop =3D NULL; > - continue; > - } > - > - if (dump_file && (dump_flags & TDF_DETAILS)) > - { > - print_gimple_stmt (dump_file, stmt, 2, 0); > - fprintf (dump_file, " invariant up to level %d, cost %d.\n\n= ", > - loop_depth (lim_data->max_loop), > - lim_data->cost); > - } > - > - if (lim_data->cost >=3D LIM_EXPENSIVE) > - set_profitable_level (stmt); > + if (!calculate_tgt_level (stmt, outermost, cond_executed, bb)) > + continue; > > where it is not obvious why moving set_profitable_level should be > in a function named calculate_tgt_level. In fact determine_max_movement > was supposed to be the abstraction already. > > Without yet getting an idea of the overall algorithm (I'm really missing = such > a thing...) it seems to me that the current handling of doing invariant m= otion > of PHIs should handle the analysis phase just fine (you might be adding > more advanced cases in this patch, but I'd rather do that as a followup f= or > initial support). > > The code-gen part looks somewhat ad-hoc to me. In principle if we have > a list of PHIs to hoist we can materialize the needed part of the original > basic-block structure on the pre-header edge together with the controlling > conditionals. The loop pre-header would be the always-executed BB > and the needed part would always start with the loop header and be a > single-entry multiple-exit region. So ideally we could use sth like > gimple_duplicate_sese_region with the BB copying worker copy_bbs > refactored/replaced with something only copying controlling conditionals > and statements/PHIs that a callback identifies (tree-loop-distribution.c > would also like sth like that - currently it copies whole loops and remov= es > stmts it didn't intend to copy...). > > So basically invariantness_dom_walker would record a vector of PHIs > to hoist (or we'd construct that at move_computations time). > If there is any control-flow to hoist then we'd do an alternative > move_computations by "copying" the needed CFG and covered stmts. > In theory doing it the loop-distribution way (simply copy the whole > loop body and then remove anything not needed) would also be possible > though that wouldn't be the very best idea obviously ;) > > Thanks, > Richard. > >> >> >> Thanks, >> >> >> >> Evgeniya --=20 Thanks, Evgeniya perfstories.wordpress.com