From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25029 invoked by alias); 15 Dec 2017 16:14:21 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 25019 invoked by uid 89); 15 Dec 2017 16:14:20 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.4 required=5.0 tests=BAYES_00,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,HTML_MESSAGE,KAM_NUMSUBJECT,KAM_SHORT,SPF_PASS,T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=ham version=3.3.2 spammy= X-HELO: userp2120.oracle.com Received: from userp2120.oracle.com (HELO userp2120.oracle.com) (156.151.31.85) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 15 Dec 2017 16:14:17 +0000 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.21/8.16.0.21) with SMTP id vBFGCARp109859; Fri, 15 Dec 2017 16:14:13 GMT Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp2120.oracle.com with ESMTP id 2evhp9g1gh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 15 Dec 2017 16:14:10 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id vBFG85dx028651 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 15 Dec 2017 16:08:05 GMT Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id vBFG84mS012796; Fri, 15 Dec 2017 16:08:05 GMT Received: from dhcp-10-154-126-215.vpn.oracle.com (/10.154.126.215) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 15 Dec 2017 08:08:04 -0800 From: Qing Zhao Message-Id: Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [PATCH][Middle-end]2nd patch of PR78809 and PR83026 Date: Fri, 15 Dec 2017 16:14:00 -0000 In-Reply-To: <20171214204542.GB2353@tucnak> Cc: gcc-patches , jeff Law , richard Biener To: Jakub Jelinek References: <20171214204542.GB2353@tucnak> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8745 signatures=668648 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1712150228 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2017-12/txt/msg01063.txt.bz2 Hi, Jakub, thanks a lot for your detailed review. > On Dec 14, 2017, at 2:45 PM, Jakub Jelinek wrote: >=20 > On Thu, Dec 14, 2017 at 01:45:21PM -0600, Qing Zhao wrote: >> 2017-12-11 Qing Zhao > >=20 > No " " in ChangeLog entries please. this is an error when I pasted this part from my terminal to mail editor, n= ot in my real code. will double check next time when sending out email. >=20 >> --- a/gcc/tree-ssa-strlen.c >> +++ b/gcc/tree-ssa-strlen.c >> @@ -2541,6 +2541,198 @@ handle_builtin_memcmp (gimple_stmt_iterator *gsi) >> return false; >> } >>=20 >> +/* Given an index to the strinfo vector, compute the string length for = the >> + corresponding string. Return -1 when unknown. */ >> +=20 >> +static HOST_WIDE_INT=20 >> +compute_string_length (int idx) >> +{ >> + HOST_WIDE_INT string_leni =3D -1;=20 >> + gcc_assert (idx !=3D 0); >> + >> + if (idx < 0) >> + string_leni =3D ~idx; >> + else >> + { >> + strinfo *si =3D get_strinfo (idx); >> + if (si) >> + { >> + tree const_string_len =3D get_string_length (si); >> + string_leni >> + =3D (const_string_len && tree_fits_uhwi_p (const_string_len) >> + ? tree_to_uhwi(const_string_len) : -1);=20 >=20 > So, you are returning a signed HWI, then clearly tree_fits_uhwi_p and > tree_to_uhwi are inappropriate, you should have used tree_fits_shwi_p > and tree_to_shwi. Space after function name is missing too. > And, as you start by initializing string_leni to -1, there is no > point to write it this way rather than > if (const_string_len && tree_fits_shwi_p (const_string_len)) > string_leni =3D tree_to_shwi (const_string_len); originally it returned an unsigned HWI. but later I changed it to return = a signed one since I need a negative value to represent the UNKNOWN state.=20 I will fix this. >=20 >> + } >> + } >=20 > Maybe also do > if (string_leni < 0) > return -1; Yes, this might be safer. >=20 >> + return string_leni; >=20 > unless the callers just look for negative value as unusable. >=20 >> + tree len =3D gimple_call_arg (stmt, 2); >> + if (tree_fits_uhwi_p (len)) >> + length =3D tree_to_uhwi (len); >=20 > Similarly to above, you are mixing signed and unsigned HWIs too much. same reason as above :-), I will fix this. >=20 >> + if (gimple_code (ustmt) =3D=3D GIMPLE_ASSIGN) >=20 > if (is_gimple_assign (ustmt)) >=20 > Usually we use use_stmt instead of ustmt. Okay. >=20 >> + {=20=20=20=20 >> + gassign *asgn =3D as_a (ustmt); >=20 > No need for the gassign and ugly as_a, gimple_assign_rhs_code > as well as gimple_assign_rhs2 can be called on gimple * too. this part of the code I just copied from the routine =E2=80=9Chandle_builti= n_memcpy=E2=80=9D and no change. I will change it as you suggested. >> + tree_code code =3D gimple_assign_rhs_code (asgn); >> + if ((code !=3D EQ_EXPR && code !=3D NE_EXPR) >> + || !integer_zerop (gimple_assign_rhs2 (asgn))) >> + return true; >> + } >> + else if (gimple_code (ustmt) =3D=3D GIMPLE_COND) >> + { >> + tree_code code =3D gimple_cond_code (ustmt); >> + if ((code !=3D EQ_EXPR && code !=3D NE_EXPR) >> + || !integer_zerop (gimple_cond_rhs (ustmt))) >> + return true; >=20 > There is another case you are missing, assign stmt with > gimple_assign_rhs_code COND_EXPR, where gimple_assign_rhs1 is > tree with TREE_CODE EQ_EXPR or NE_EXPR with TREE_OPERAND (rhs1, 1) > integer_zerop. a little confused here: in the current code: . the first case is: result =3D strcmp() !=3D 0 . the second case is: if (strcmp() !=3D 0) so, the missing case you mentioned above is: result =3D if (strcmp() !=3D 0)=20 or something else? >=20 >> + /* When both arguments are known, and their strlens are unequal, we c= an=20 >> + safely fold the call to a non-zero value for strcmp; >> + othewise, do nothing now. */ >> + if (idx1 !=3D 0 && idx2 !=3D 0) >> + { >> + HOST_WIDE_INT const_string_leni1 =3D -1; >> + HOST_WIDE_INT const_string_leni2 =3D -1; >> + const_string_leni1 =3D compute_string_length (idx1); >> + const_string_leni2 =3D compute_string_length (idx2); >=20 > Why do you initialize the vars when you immediately overwrite it? just a habit to declare a variable with initialization :-). > Just do > HOST_WIDE_INT const_string_leni1 =3D compute_string_length (idx1); I can change it like this. > etc. >=20 >> + /* When one of args is constant string. */ >> + tree var_string; >> + HOST_WIDE_INT const_string_leni =3D -1; >> +=20=20 >> + if (idx1) >> + { >> + const_string_leni =3D compute_string_length (idx1); >> + var_string =3D arg2; >> + }=20 >> + else if (idx2) >> + { >> + const_string_leni =3D compute_string_length (idx2); >> + var_string =3D arg1; >> + }=20 >=20 > Haven't you checked earlier that one of idx1 and idx2 is non-zero? Yes.=20=20 it=E2=80=99s guaranteed that there is one and ONLY one of idx1 and idx2 is= non-zero when getting here.=20 > If so, then the else if (idx2) will just might confuse -Wuninitialized, Okay. > if you just use else, you don't need to initialize const_string_leni > either. I think that const_string_leni still need to be initialized in this case, b= ecause when idx2 is non-zero,=20=20 const_string_leni is initialized to compute_string_length (idx2).=20 >=20 >> + /* Try to get the min and max string length for var_string, the max l= ength is >> + the size of the array - 1, recorded in size[1]. */=20 >> + get_range_strlen (var_string, size); >> + if (size[1] && tree_fits_uhwi_p (size[1])) >> + var_sizei =3D tree_to_uhwi (size[1]) + 1; >=20 > This is something that looks problematic to me. get_range_strlen returns > some conservative upper bound on the string length, which is fine if > var_string points to say a TREE_STATIC variable where you know the alloca= ted > size, or automatic variable. But if somebody passes you a pointer to a > structure and the source doesn't contain aggregate copying for it, not su= re > if you can take for granted that all the bytes are readable after the '\0' > in the string. Hopefully at least for flexible array members and arrays = in > such positions get_range_strlen will not provide the upper bound, but even > in other cases it doesn't feel safe to me. this is the part that took me most of the time during the implementation.=20 I have considered the following 3 approaches to decide the size of the vari= able array: A. use =E2=80=9Ccompute_builtin_object_size=E2=80=9D in tree-object-size.h= to decide the size of the object. However, even with the simplest case, it cannot provide the infor= mation.=20 B. use =E2=80=9Cget_range_strlen=E2=80=9D in gimple-fold.h to decide the s= ize of the object. however,=20 it cannot provide valid info for simple array, either.=20 C. write my own routine to decide the size of the object. I gave up C first because I think that this new routine is very likely a pa= rtially redundant routine as A or B, adding a partially redundant functionality into the code base might= not be good. Then I gave up A because I think that updating A will have more impact than= updating B. when reading B=E2=80=99s comments as following: /* Determine the minimum and maximum value or string length that ARG refers to and store each in the first two elements of MINMAXLEN. For expressions that point to strings of unknown lengths that are character arrays, use the upper bound of the array as the maximum length. For example, given an expression like 'x ? array : "xyz"' and array declared as 'char array[8]', MINMAXLEN[0] will be set to 3 and MINMAXLEN[1] to 7, the longest string that could be stored in array. Return true if the range of the string lengths has been obtained from the upper bound of an array at the end of a struct. Such an array may hold a string that's longer than its upper bound due to it being used as a poor-man's flexible array member. */ bool get_range_strlen (tree arg, tree minmaxlen[2]) { } In the above, it clearly said: "For expressions that point to strings of unknown lengths that are character arrays, use the upper bound of the array as the maximum length.=E2=80=9D So, I feel that this serves my purpose well.=20 then I updated the routine get_range_strlen to make it handles simple array= case.=20 (this change has just been checked into upstream yesterday: https://gcc.gnu.org/viewcvs/gcc?view=3Drevision&revision=3D255654 ) Let me know your suggestions. I will be happy to update the code with your suggestion.=20 >=20 > Furthermore, in the comments you say that you do it only for small string= s, > but in the patch I can't see any upper bound, so you could transform strl= en > that would happen to return say just 1 or 2 with a function call that > possibly reads megabytes of data (memcmp may read all bytes, not just stop > at the first difference). do you mean for very short constant string, we should NOT change it to a. c= all to memcmp? instead we should just=20 inline it with byte comparison sequence? >> + unsigned HOST_WIDE_INT final_length=20 >> + =3D is_ncmp ? length : (const_string_leni + 1); >=20 > Why the ()s? just want to make the =E2=80=9Cconst_string_leni + 1=E2=80=9D together :-). I can delete the () if the coding style prefers. thanks. Qing >=20 > Jakub