From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 34382 invoked by alias); 22 Jun 2019 01:48:58 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 34374 invoked by uid 89); 22 Jun 2019 01:48:58 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-13.3 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=apologies, H*f:sk:8703960, H*i:sk:8703960 X-HELO: mail-lf1-f45.google.com Received: from mail-lf1-f45.google.com (HELO mail-lf1-f45.google.com) (209.85.167.45) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 22 Jun 2019 01:48:55 +0000 Received: by mail-lf1-f45.google.com with SMTP id a25so6295345lfg.2 for ; Fri, 21 Jun 2019 18:48:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=Q8iZ5QVyZKChBJVCB6ZRUPywKBV67Ea59R/xeRPq4Rc=; b=Q3oK6VPpvqBh3FypP6NrDUt1C2EeZKCo/rcIVdrbQwY0VJwAUr+6DwJNkluLDLgpJS tq0+5/9BaUqpsqvVYVDGAQaSHk+hVlivciQKVr5Yshu5OpqJi300wrSSXkoUaeTPJUn7 wxjKrDAvhT7GZH8A5F7YdS/fRY5Sjt+NVZEh308dONLhRwIBTnRjor2h9MCflkm17BwQ QWrMTsZUDL0mf/IxatnbKyFbDcaWArscluSdnwDR1YIVvWPvSL+hp8patdm3F2tsQWhC iqKFmhmS+KvmL60EN0+eHnNBo9SllrW+D1XBszpwAVZim0DprQhEwPCG4epnPsrLMxI2 9wEg== MIME-Version: 1.0 References: <87039602-15ee-f6a0-6566-3d6445ce1294@redhat.com> In-Reply-To: <87039602-15ee-f6a0-6566-3d6445ce1294@redhat.com> From: Kugan Vivekanandarajah Date: Sat, 22 Jun 2019 01:48:00 -0000 Message-ID: Subject: Re: On-Demand range technology [1/5] - Executive Summary To: Andrew MacLeod Cc: GCC , Jeff Law , Aldy Hernandez Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2019-06/txt/msg00257.txt.bz2 Hi Andrew, Thanks for looking into it and my apologies for not being clear. My proposal was to use value ranges when expanding gimple to RTL and eliminate redundant zero/sign extensions. I.e., if we know the value generated by some gimple operation is already in the (zero/sign) extended from based on our VR analysis, we could skip the SUBREG and ZERO/SIGN_EXTEND (or set SRP_SIGNED_AND_UNSIGNED and likes). However, the problem is, RTL operations are done in PROMOTE_MODE precision while gimple value range is in natural types. This can be a problem when type wraps (and shows up mainly for targets where PROMOTE_MODE is DImode like alpha). For example, as Uros pointed out with the reverted patch, for alpha-linux w= e had FAIL: libgomp.fortran/simd7.f90 -O2 execution test FAIL: libgomp.fortran/simd7.f90 -Os execution test The reason being that values wrap and in VR calculation we only records the type precision (which is what matters for gimple) but in order to eliminate the zero/sign extension we need the full precision in the PROMOTE_MODE. Extract from the testcase failing: _343 =3D ivtmp.179_52 + 2147483645; [0x80000004, 0x800000043] _344 =3D _343 * 2; [0x8, 0x86] _345 =3D (integer(kind=3D4)) _344; [0x8, 0x86] With the above VR of [0x8, 0x86] (in promoted precision which is [0x100000008, 0x100000086]), my patch was setting SRP_SIGNED_AND_UNSIGNED which was wrong and causing the error (eliminating and extension which was not redundant). If we had the VR in promoted precision, the patch would be correct and used to eliminate redundant zero/sign extensions. Please let me know if my explanation is not clear and I will show it with more examples. Thanks, Kugan On Fri, 21 Jun 2019 at 23:27, Andrew MacLeod wrote: > > On 6/19/19 11:04 PM, Kugan Vivekanandarajah wrote: > > Hi Andrew, > > Thanks for working on this. > > Enable elimination of zext/sext with VRP patch had to be reverted in > (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00672.html) due to the > need for value ranges in PROMOTED_MODE precision for at least 1 test > case for alpha. > > Playing with ranger suggest that it is not possible to get value > ranges in PROMOTED_MODE precision on demand. Or is there any way we > can use on-demand ranger here? > > Thanks, > Kugan > > > > I took a look at the thread, but I think I'm still missing some context. > > Lets go back to the beginning. What is an example of the case we arent g= etting that you want to get? > > I'm going to guess to start :-) > > short foo(unsigned char c) > { > c =3D c & (unsigned char)0x0F; > if( c > 7 ) > return((short)(c - 5)); > else > return(( short )c); > } > > > > A run of this thru the ranger shows me code that looks like (on x86 anywa= y): > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D BB 2 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > c_4(D) [0, 255] unsigned char > : > c_5 =3D c_4(D) & 15; > _9 =3D c_4(D) & 8; > if (_9 !=3D 0) > goto ; [INV] > else > goto ; [INV] > > c_5 : [0, 15] unsigned char > _9 : [0, 0][8, 8] unsigned char > 2->3 (T)*c_5 : [0, 15] unsigned char > 2->3 (T) _9 : [8, 8] unsigned char > 2->4 (F)*c_5 : [0, 15] unsigned char > 2->4 (F) _9 : [0, 0] unsigned char > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D BB 3 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > c_5 [0, 15] unsigned char > : > _1 =3D (unsigned short) c_5; > _2 =3D _1 + 65531; > _7 =3D (short int) _2; > // predicted unlikely by early return (on trees) predictor. > goto ; [INV] > > _1 : [0, 15] unsigned short > _2 : [0, 10][65531, 65535] unsigned short > _7 : [-5, 10] short int > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D BB 4 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > c_5 [0, 15] unsigned char > : > _6 =3D (short int) c_5; > // predicted unlikely by early return (on trees) predictor. > > > I think I see. we aren't adjusting the range of c_5 going into blocks 3 = and 4. its obvious from the original source where the code says < 7, but o= nce its been "bitmasked" that info becomes obfuscated. > . > If you were to see a range in bb3 of c_5 =3D [8,15], and a range in bb4 o= f c_4 =3D [0,7], would that solve your problem? > > so in bb3 , we'd then see ranges that look like: > > _1 : [8, 15] unsigned short > _2 : [3, 10] unsigned short > _7 : [3, 10] short int > > and then later on we'd see there is no negative/wrap value, and you could= could just drop the extension then? > > SO. > > yes. this is fixable, but is it easy? :-) > > We're in the process of replacing the range extraction code with the rang= e-ops/gori-computes components from the ranger. This is the part which fi= gures ranges out from individual statements on exit to a block.. > > We have implemented mostly the same functionality of VRP to start, and on= e of the things we have not done is enhanced the bitmask tracking side of i= t. > The gori-computes component which performs backwards calculations does vi= rtually nothing with masks right now. > > lets look at this hunk > > c_5 =3D c_4(D) & 15; > _9 =3D c_4(D) & 8; > if (_9 !=3D 0) > > on the true side of this branch, we start with _9 having a range of [0,0]= [8,8] > > [0,0][8,8] =3D c_4(D) & 8 > > unfortunately we dont know much about c_4, its effectively VARYING at thi= s point. We do know c_5 has a range of [0,15]. if that statement were > _9 =3D c_5 & 8 > this problem would be quite trivial A quick tweak to the windback code = for bitwise_and would generate the correct results because we';d see > [1,1] =3D [0,15] & 8 and we'd come up with [8, 15] for c_5 no problem. > > but its not. > > I do have some "re-evalaution" code in the ranger which is only in the p= roof of concept stages which might solve this problem down the road. > > what the re-evlaution code does is look at the any changes to ranges whic= h are in the definition chain of the ssa-name you are looking at. > going back to the example again: > c_5 =3D c_4(D) & 15; > _9 =3D c_4(D) & 8; > if (_9 !=3D 0) > _1 =3D (unsigned short) c_5; <<--- here. > > When we ask for the range of c_5 here, we calculate the range normally an= d come up with [0,15]. > The re-evalution code then takes a look and notices that c_5 is also depe= ndent on the value of c_4 , and queries if the range of c_4 changed at this= point. > Well, we can know the value of c_4 must have the 0x8 bit set in this bloc= k. This can be fed into a "re-evaluation" of c_5, and also then know that= c_5 must also have the 0x8 bit set. > THis would then reduce the calculated range from the original [0,15] to a= range of [8, 15] > > So the mechanisms within the new code are sort of there, but they have no= t been flushed out. THis will require > a) the re-evaluation code to be completed. Im not sure if it can be ap= plied within EVRP (It seems it should be possible) or whether it requires = the ranger. > b) some level of integration of bitfields with the range info so we can = also track bits set and cleared along with the range, and integrate them wi= th the calculations. > > SO assuming I have been working on the correct problem here, No. we dont = get this now. > > Once we get the range-ops and gori-computes code integrated and working w= ith EVRP and the bitfield stuff, we can again take a look and see where we = stand. I will keep it in mind as a practical example of how we might want = to combine the information. > > So revisit this in September maybe? > > Or have i completely missed the mark of the problem you are trying to sol= ve? :-) > > Andrew > > PS > > Your original question is > > > Playing with ranger suggest that it is not possible to get value > ranges in PROMOTED_MODE precision on demand. Or is there any way we > can use on-demand ranger here? > > > I'm not sure i follow completely why, but in theory it would be possible = to add code to the ranger to walk a definition list calculating ranges in w= hatever precision you want > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D BB 2 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > c_4(D) [0, 255] unsigned char > : > c_5 =3D c_4(D) & 15; > _9 =3D c_4(D) & 8; > if (_9 !=3D 0) > goto ; [INV] > else > goto ; [INV] > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D BB 3 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > c_5 [0, 15] unsigned char > : > _1 =3D (unsigned short) c_5; > _2 =3D _1 + 65531; > _7 =3D (short int) _2; > > Im not sure how that would help? the root of the problem would appear t= o be the range of c_5 is not properly calculated to be [8,15]. no matter w= hat precision you use, the [0,4] + 65531 part of the range is still going t= o produce an undesireable set upper bit, and still require the zext/sext? > >