From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x442.google.com (mail-wr1-x442.google.com [IPv6:2a00:1450:4864:20::442]) by sourceware.org (Postfix) with ESMTPS id 295F5386EC42 for ; Mon, 29 Jun 2020 10:40:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 295F5386EC42 Received: by mail-wr1-x442.google.com with SMTP id r12so15964377wrj.13 for ; Mon, 29 Jun 2020 03:40:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=1oOljT+brMETZDgdm1YrUkAPbQud6SVTr2V9nKauUWs=; b=i4ssm1oWZG1za1GZXsiyWwfiF0fLNo0g4c8t/uJamx3Gju05EsAa0g+MqoEZtfy7Ra NCLRvZlHtH97uWPcenQx5ePMlm5d2jb+DIF9mCiiLGwyAfMtBHwsgTP6X6bZzsBCVXUW eeHkHxrMKPNRAnsXJZMIp90tTlSoEUk5aquUqtspKGorJobiKG52hPBSBONAIsVOXR/m vCv3+3FRb01la++1H26Y3r+pyeXyPzjhM4j2GTSc+pvqsihz/bJt7joumxQYpfBMv0Au 4J6F/eg4yQdBvNxL6nhhpeOVmoMuaW0S4lhO3jqfvOrrxLMAX2mgym0FTOReTEwTxess pjAA== X-Gm-Message-State: AOAM5308ZJHYM1YvtY817Hwa3kFNc2BY7k7s6h0GOI6gEJjKs2o99n6U eiTpglu31JODbRjRzC2xTnZG91Czw6vRwi+JhIUXg5An X-Google-Smtp-Source: ABdhPJxN/SV6Vgs1/rWx0rDGecsE65RJ1rNoxjqTjuuVoKMcWCzS8KBhMlyDcRiyt9bCzmBGMwzCRmUJY8QGmIVDAxg= X-Received: by 2002:a50:9dc8:: with SMTP id l8mr16004819edk.248.1593415167472; Mon, 29 Jun 2020 00:19:27 -0700 (PDT) MIME-Version: 1.0 References: <29c9b3fa-69a2-dca9-1477-54aac80c8680@gmail.com> <5b4805a5-1949-267c-dc40-6f084349a68b@gmail.com> In-Reply-To: <5b4805a5-1949-267c-dc40-6f084349a68b@gmail.com> From: Richard Biener Date: Mon, 29 Jun 2020 09:19:16 +0200 Message-ID: Subject: Re: [PATCH] handle MEM_REF with void* arguments (PR c++/95768) To: Martin Sebor Cc: Jason Merrill , gcc-patches Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00, DATE_IN_PAST_03_06, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2020 10:40:11 -0000 On Mon, Jun 29, 2020 at 1:08 AM Martin Sebor wrote: > > On 6/23/20 1:12 AM, Richard Biener wrote: > > On Tue, Jun 23, 2020 at 12:22 AM Martin Sebor via Gcc-patches > > wrote: > >> > >> On 6/22/20 12:55 PM, Jason Merrill wrote: > >>> On 6/22/20 1:25 PM, Martin Sebor wrote: > >>>> The attached fix parallels the one for the equivalent C bug 95580 > >>>> where the pretty printers don't correctly handle MEM_REF arguments > >>>> with type void* or other pointers to an incomplete type. > >>>> > >>>> The incorrect handling was exposed by the recent change to > >>>> -Wuninitialized which includes such expressions in diagnostics. > >>> > >>>> + if (tree size =3D TYPE_SIZE_UNIT (TREE_TYPE (argtype))) > >>>> + if (!integer_onep (size)) > >>>> + { > >>>> + pp_cxx_left_paren (pp); > >>>> + dump_type (pp, ptr_type_node, flags); > >>>> + pp_cxx_right_paren (pp); > >>>> + } > >>> > >>> Don't we want to print the cast if the pointer target type is incompl= ete? > >> > >> I suppose, yes, although after some more testing I think what should > >> be output is the type of the access. The target pointer type isn't > >> meaningful (at least not in this case). > >> > >> Here's what the warning looks like in C for the test case in > >> gcc.dg/pr95580.c: > >> > >> warning: =E2=80=98*((void *)(p)+1)=E2=80=99 may be used uninitiali= zed > >> > >> and like this in C++: > >> > >> warning: =E2=80=98*(p +1)=E2=80=99 may be used uninitialized > >> > >> The +1 is a byte offset, which is correct given that incrementing > >> a void* in GCC is the same as adding 1 to the byte address, but > >> dereferencing a void* doesn't correspond to what's going on in > >> the source. > >> > >> Even for a complete type (with size greater than 1), printing > >> the type of the argument plus a byte offset is wrong. It ends > >> up with this for the C++ test case from 95768: > >> > >> warning: =E2=80=98*((int*) +4)=E2=80=99 is used uninitial= ized > >> > >> when the access is actually =E2=80=98*((int*) +1)=E2=80=99 > >> > >> So it seems to me for MEM_REF, to make the output meaningful, > >> it's the type of the access (i.e., the MEM_REF type) that should > >> be printed here, and the offset should either be in elements of > >> the accessed type, i.e., > >> > >> warning: =E2=80=98*((int*) +1)=E2=80=99 is used uninitial= ized > >> > >> or, if the access is misaligned, the argument should first be > >> cast to char*, the offset added, and the result then cast to > >> the access type, like this: > >> > >> warning: =E2=80=98*(T*)((char*) +1)=E2=80=99 is used unin= itialized > >> > >> The attached revised and less than fully tested patch implements > >> this for C++ only for now. If we agree on this approach I'll see > >> about making the corresponding change in C. > > > > Note that there is no C/C++ way of fully expressing MEM_REF > > semantics. __MEM ((T *)p + 1) is not actually > > *(int *)((char *)p + 1) because that does not reflect that the > > effective type of the lvalue when TBAA is concerned is 'T' > > rather than 'int'. > > What form would you say is closest to the C/C++ semantics, or > likely the most useful to users, that GCC could print instead? Hmm, I'd try *() maybe? Because there's no C/C++ that can express what GIMPLE can do here. Of course pattern matching the exact cases we can handle like your patch is an improvement (but as said the TBAA issue is still present). > > Note for MEM_REF the offset is always > > a constant byte offset but it indeed does not have to be a > > multiple of the MEM_REF type size. > > > > I wonder whether printing the MEM_REF in full provides > > any real diagnostic value in the more "obfuscated" cases. > > I'm not sure what obfuscated cases you're thinking of, or what > you mean by printing it in full. I think that printing =E2=80=98*(T*)((char*) +1)=E2=80=99 is likel= y going to confuse users because they cannot match this to a source location. If we have a source location we should have caret diagnostics. > I instrumented the code to > print every MEM_REF in that comes up in warn_uninitialized_vars > and rebuilt GCC. There are 17,456 distinct instances so I didn't > review them all but those I did look at all look reasonable. > Probably the least useful are those that mention by > itself (i.e., or *). Those with an offset > are more informative (e.g., *((access**) +1). In > a few the offset is very large, such as *((unsigned int*)sp > +4611686018427387900), but that doesn't seem like a problem. > I'd be happy to share the result. Here +4611686018427387900 should be printed as -4, MEM_REF offsets are to be interpreted as signed. > > > > I'd also not print but . > > I also don't find helpful, but I don't see > as an improvement. I think printing the SSA variable would be > more informative here since its name is usually related to > the variable it was derived from in the source. But making that > change (or any other like it) feels like too much feature creep > for this fix. I'd be happy to do it in a follow up if we agree > it's a good idea. > > Either way, please let me know if the patch is okay as is or, > if not, what type it should mention. + if (tree size =3D TYPE_SIZE_UNIT (TREE_TYPE (argtype))) + if (!integer_onep (size)) + { this should be if (!TYPE_SIZE_UNIT (...) || !integer_onep (...)). Otherwise the original patch looks OK to me. As said for your second patch printing *(int*)p only if p is offsetted is inconsistent and misleading for TBAA reasons. So I do not view it as general improvement. If the type of the MEM_REF offset and the access type agree doing that would be fine but then MEM_REF(p) and MEM_REF(p+4) should be treated the same. That said, can we fix the segfault first? Also see c-pretty-print.c for another "copy" of this functionality. It feels we should dispatch to c-family/ code here. Richard. > > Martin