From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21785 invoked by alias); 27 May 2015 13:07:54 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 21764 invoked by uid 89); 27 May 2015 13:07:52 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-lb0-f180.google.com Received: from mail-lb0-f180.google.com (HELO mail-lb0-f180.google.com) (209.85.217.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 27 May 2015 13:07:42 +0000 Received: by lbbqq2 with SMTP id qq2so6735678lbb.3 for ; Wed, 27 May 2015 06:07:39 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.112.159.162 with SMTP id xd2mr27182541lbb.67.1432732059470; Wed, 27 May 2015 06:07:39 -0700 (PDT) Received: by 10.112.1.100 with HTTP; Wed, 27 May 2015 06:07:39 -0700 (PDT) In-Reply-To: References: Date: Wed, 27 May 2015 13:36:00 -0000 Message-ID: Subject: Re: conditional lim From: Evgeniya Maenkova To: Richard Biener Cc: GCC Patches Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2015-05/txt/msg02421.txt.bz2 On Wed, May 27, 2015 at 2:11 PM, Richard Biener wrote: > On Tue, May 26, 2015 at 3:10 PM, Evgeniya Maenkova > wrote: >> Hi, Richard >> >> Thanks for review starting. >> >> Do you see any major issues with this patch (i.e. algorithms and ideas >> that should be completely replaced, effectively causing the re-write >> of most code)? >> >> To decide if there are major issues in the patch, perhaps, you need >> additional clarifications from me? Could you point at the places where >> additional explanations could save you most effort? >> >> Your answers to these questions are looking the first priority ones. >> You wrote about several issues in the code, which are looking as easy >> (or almost easy ;) to fix(inline functions, unswitch-loops flag, >> comments, etc). But, I think you agree, let=E2=80=99s first decide about= the >> major issues (I mean, whether we continue with this patch or starting >> new one, this will save a lot of time for both of us). > > I didn't get an overall idea on how the patch works, that is, how it inte= grates > with the existing algorithm. If you can elaborate on that a bit that wou= ld > be helpful. > Hi, Sure, I'll write you some notes in several days. > I think the code-generation part needs some work (whether by following > my idea with re-using copy_bbs or whether by basically re-implementing > it is up to debate). How does your code handle > > for () > { > if (cond1) > { > if (cond2) > invariant; > if (cond3) > invariant; > } > } > > ? Optimally we'd have before the loop exactly the same if () structure > (thus if (cond1) is shared). If both invariants are going out of the same loop (i mean tgt_level), then if structure will be the same. for1() for () { if (cond1) { if (cond2) invariant1; if (cond3) invariant2; } } will be transformed to for1() if (cond1) { if (cond2) invariant1; if (cond3) invariant2; } } for () { if (cond1) { if (cond2); if (cond3); } } (I don't cleanup empty if's in lim code). If these invarians are moved in different loops then for1 for2() for() { if (cond1) { if (cond2) invariant1; if (cond3) invariant2; } } will be transformed to: for1 { if (cond1) if (cond2) invariant1; for2() { if (cond1) if (cond3) invariant2; for() { if (cond1) { if (cond2); if (cond3); } } } } Of course, there could be some bugs, but the idea was as mentioned above. This transformation was looking logical to me. What do you think? Thanks, Evgeniya > > Richard. > > >> Thanks, >> >> Evgeniya >> >> On Tue, May 26, 2015 at 2:31 PM, Richard Biener >> wrote: >>> On Fri, May 8, 2015 at 11:07 PM, Evgeniya Maenkova >>> wrote: >>>> Hi, >>>> >>>> Could you please review my patch for predicated lim? >>>> >>>> Let me note some details about it: >>>> >>>> >>>> >>>> 1) Phi statements are still moved only if they have 1 or 2 >>>> arguments. However, phi statements could be move under conditions (as >>>> it=E2=80=99s done for the other statements). Probably, phi statement = motion >>>> with 3 + arguments could be implemented in the next patch after >>>> predicated lim. >>>> >>>> 2) Patch has limitations/features like (it was ok to me to >>>> implement it such way, maybe I=E2=80=99m not correct. ): >>>> >>>> a) Loop1 >>>> >>>> { >>>> >>>> If (a) >>>> >>>> Loop2 >>>> >>>> { >>>> >>>> Stmt - Invariant for Loop1 >>>> >>>> } >>>> >>>> } >>>> >>>> In this case Stmt will be moved only out of Loop2, because of i= f (a). >>>> >>>> b) Or >>>> >>>> Loop1 >>>> >>>> { >>>> >>>> =E2=80=A6 >>>> >>>> If (cond1) >>>> >>>> If (cond2) >>>> >>>> If (cond3) >>>> >>>> Stmt; >>>> >>>> } >>>> >>>> Stmt will be moved out only if cond1 is always executed in Loop1. >>>> >>>> c) It took me a long time to write all of these code, so there >>>> might be other peculiarities which I forgot to mention. :) >>>> >>>> Let=E2=80=99s discuss these ones as you will re= view my patch. >>>> >>>> 3) Patch consists of 9 files: >>>> >>>> a) gcc/testsuite/gcc.dg/tree-ssa/loop-7.c, >>>> gcc/testsuite/gcc.dg/tree-ssa/recip-3.c =E2=80=93 changed tests: >>>> >>>> - gcc/testsuite/gcc.dg/tree-ssa/loop-7.c changed as >>>> predicated lim moves 2 more statements out of the loop; >>>> >>>> - gcc/testsuite/gcc.dg/tree-ssa/recip-3.c =E2=80=93 with cond= itional >>>> lim recip optimization in this test doesn=E2=80=99t work (the correspo= nding >>>> value is below threshold as I could see in the code for recip, 1<3). >>>> So to have recip working in this test I changed test a little bit. >>>> >>>> b) gcc/tree-ssa-loop-im.c =E2=80=93 the patched lim per se >>>> >>>> c) gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c, >>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c, >>>> >>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c, >>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c, >>>> >>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c, >>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c >>>> >>>> the tests for conditional lim. >>>> >>>> 4) Patch testing: >>>> >>>> a) make =E2=80=93k check (no difference in results for me for the= clean >>>> build and the patched one, >>>> >>>> - Revision: 222849, >>>> >>>> - uname -a >>>> >>>> Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct >>>> 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux >>>> >>>> b) Bootstrap. >>>> >>>> It goes well now, however to fix it I have made a temporary hack in >>>> the lim code. And with this fix patch definitely shouldn=E2=80=99t be >>>> committed. >>>> >>>> I did so, as I would like to discuss this issue first. >>>> >>>> The problem is: I got stage2-stage3 comparison failure on the single >>>> file (tree-vect-data-refs.o). After some investigation I understood >>>> that tree-vect-data-refs.o differs being compiled with and without >>>> =E2=80=98-g=E2=80=99 option (yes, more exactly on stage 2 this is =E2= =80=98-g =E2=80=93O2 =E2=80=93gtoggle=E2=80=99, >>>> and for stage 3 this is =E2=80=98-g =E2=80=93O2=E2=80=99. But to simpl= ify things I can >>>> reproduce this difference on the same build (even not bootstrapped), >>>> with option =E2=80=98 =E2=80=93g=E2=80=99 and without it). >>>> >>>> Of course, the file compiled with =E2=80=93g option will contain debug >>>> information and will differ from the corresponding file without debug >>>> information. I mean there is the difference reported by >>>> contrib/compare-debug (I mean we talk about stripped files). >>>> >>>> Then I compared assemblers and lim dump files and made a conclusion >>>> the difference is: >>>> >>>> There is statement _484=3Dsomething, which is conditionally moved out = of >>>> loop X. In non debug cases no usages of _484 are left inside the loop >>>> X. In debug case, there is the single use in debug statement. So for >>>> debug case my patch generates phi statement for _484 to make it >>>> available inside the loop X. For non debug case, no such phi statement >>>> generated as there is no uses of _484. >>>> >>>> There is one more statement with lhs=3D_493, with exactly this pattern >>>> of use. But this is not important for our discussion. >>>> >>>> >>>> >>>> So, in my opinion it=E2=80=99s ok to generate additional phi node for = debug >>>> case. But I=E2=80=99m not a compiler expert and maybe there is some >>>> requirement that debug and non-debug versions should differ only by >>>> debug statements, I don=E2=80=99t know. >>> >>> As Andi said code generation needs to be the same. >>> >>>> My temporary hack fix is just removing of such debug statements (no >>>> debug statements, no problems J). >>> >>> That's fine and generally what is done in other passes. >>> >>>> The alternatives I see are: >>>> >>>> - Move debug statements outside the loop (not good in my opin= ion); >>>> >>>> - Somehow reset values in debug statements (not good in my >>>> opinion, if I could provide correct information for them); >>> >>> You could re-set them via gimple_debug_bind_reset_value which >>> will tell the user that at this point the value is optimized out >>> (that's slightly >>> better than removing the debug stmts). >>> >>>> - Generate phi for debug statements and fix bootstrap (say, >>>> let=E2=80=99s have some list of files, which we have special check for= . I mean >>>> for my case, the difference is not in stage2 and stage3, it=E2=80=99s = in debug >>>> and non-debug). I like this variant. However, as I don=E2=80=99t see s= uch list >>>> of special files in the whole gcc, I think I might be missing >>>> something. >>> >>> The policy ;) >>> >>>> What do you think? >>> >>> Now about the patch. Generally there seem to be spurious whitespace-on= ly >>> or formatting differences - try to avoid these. >>> >>> All new functiuons need a comment explaining them and their function >>> parameters. >>> >>> +static cond_data_map *get_cond_data_map (struct loop * level) >>> +{ >>> >>> per GCC coding conventions this should be formatted like >>> >>> static cond_data_map * >>> get_cond_data_map (struct loop * level) >>> { >>> >>> +#define SET_ALWAYS_EXECUTED_IN(BB, VAL)\ >>> + BB->aux =3D (void *) (XCNEW (struct move_cond));\ >>> + BB_MOVE_COND (BB)->type=3DALWAYS_EXECUTED_IN;\ >>> + BB_MOVE_COND (BB)->loop=3DVAL; >>> >>> this kind of side-effects in macros are bad - better turn them into >>> inline functions. >>> >>> - if (flag_unswitch_loops >>> - && gimple_code (stmt) =3D=3D GIMPLE_COND) >>> - { >>> - /* If we perform unswitching, force the operands of the invariant >>> - condition to be moved out of the loop. */ >>> - return MOVE_POSSIBLE; >>> - } >>> + if (gimple_code (stmt) =3D=3D GIMPLE_COND) >>> + return MOVE_POSSIBLE; >>> >>> any reason you removed the guard on flag_unswitch_loops? We may >>> want to add a new flag, certainly this transform might not be suitable >>> for -O[12s] for example. Like if not all stmts inside a conditional can >>> be moved the transform will increase code-size. >>> >>> You are doing a (lot of?) refactoring which makes review of the patch >>> harder. For example >>> >>> +bool calculate_tgt_level (gimple stmt, struct loop *outermost, >>> + bool cond_executed, basic_block bb) >>> +{ >>> + struct lim_aux_data *lim_dat >>> ... >>> + >>> + if (lim_data->cost >=3D LIM_EXPENSIVE) >>> + set_profitable_level (stmt); >>> + return true; >>> +} >>> >>> ... >>> >>> - >>> - lim_data =3D init_lim_data (stmt); >>> - lim_data->always_executed_in =3D outermost; >>> - >>> - if (!determine_max_movement (stmt, false)) >>> - { >>> - lim_data->max_loop =3D NULL; >>> - continue; >>> - } >>> - >>> - if (dump_file && (dump_flags & TDF_DETAILS)) >>> - { >>> - print_gimple_stmt (dump_file, stmt, 2, 0); >>> - fprintf (dump_file, " invariant up to level %d, cost %d.\n= \n", >>> - loop_depth (lim_data->max_loop), >>> - lim_data->cost); >>> - } >>> - >>> - if (lim_data->cost >=3D LIM_EXPENSIVE) >>> - set_profitable_level (stmt); >>> + if (!calculate_tgt_level (stmt, outermost, cond_executed, bb)) >>> + continue; >>> >>> where it is not obvious why moving set_profitable_level should be >>> in a function named calculate_tgt_level. In fact determine_max_movement >>> was supposed to be the abstraction already. >>> >>> Without yet getting an idea of the overall algorithm (I'm really missin= g such >>> a thing...) it seems to me that the current handling of doing invariant= motion >>> of PHIs should handle the analysis phase just fine (you might be adding >>> more advanced cases in this patch, but I'd rather do that as a followup= for >>> initial support). >>> >>> The code-gen part looks somewhat ad-hoc to me. In principle if we have >>> a list of PHIs to hoist we can materialize the needed part of the origi= nal >>> basic-block structure on the pre-header edge together with the controll= ing >>> conditionals. The loop pre-header would be the always-executed BB >>> and the needed part would always start with the loop header and be a >>> single-entry multiple-exit region. So ideally we could use sth like >>> gimple_duplicate_sese_region with the BB copying worker copy_bbs >>> refactored/replaced with something only copying controlling conditionals >>> and statements/PHIs that a callback identifies (tree-loop-distribution.c >>> would also like sth like that - currently it copies whole loops and rem= oves >>> stmts it didn't intend to copy...). >>> >>> So basically invariantness_dom_walker would record a vector of PHIs >>> to hoist (or we'd construct that at move_computations time). >>> If there is any control-flow to hoist then we'd do an alternative >>> move_computations by "copying" the needed CFG and covered stmts. >>> In theory doing it the loop-distribution way (simply copy the whole >>> loop body and then remove anything not needed) would also be possible >>> though that wouldn't be the very best idea obviously ;) >>> >>> Thanks, >>> Richard. >>> >>>> >>>> >>>> Thanks, >>>> >>>> >>>> >>>> Evgeniya >> >> >> >> -- >> Thanks, >> >> Evgeniya >> >> perfstories.wordpress.com --=20 Thanks, Evgeniya perfstories.wordpress.com