From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12e.google.com (mail-lf1-x12e.google.com [IPv6:2a00:1450:4864:20::12e]) by sourceware.org (Postfix) with ESMTPS id AB12C385841C for ; Tue, 4 Jun 2024 07:43:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AB12C385841C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org AB12C385841C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::12e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717487031; cv=none; b=ZV3vGLP1je/zIBIwjQCYoVEwUi4S7aHWaGhfcXP2iKsO5JGuq1e8a07Gzm27GeNeJx70xoIvGzb3Ulgj5ZV8rUJNOKXvsvb//poCLt9Au2KQk5oZ6j6bG3FAv3SAaEuxyVejzh0bmrgP6GOYwnLyHIMSr1D1reasFI1RfNgNYTc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717487031; c=relaxed/simple; bh=to6btfNTa/3fm17kXGruPbtLllLM+3qlHJlZIAwq9J8=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=hxpZwuZolTAOPsyyg1rpS/eCG5xZQ9+CzDJs3PuwJ6rZafFe3fQVzynUnVTmuvyERy32D+ny/fphJfP5AWz/lUTAUVCrHrUQtCULo9HzINY8GxsVAzbmYJkrljcgVE7tUvDVCYG47Prho6NUcGuhHMTzNCqLK+1xWfoM9vJP8a4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x12e.google.com with SMTP id 2adb3069b0e04-52b8e0e98adso1026128e87.0 for ; Tue, 04 Jun 2024 00:43:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717487027; x=1718091827; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gwRctPk8YA2xtoZZfVBydgMwAPoyp7QmT6ClvB00PPw=; b=ENcsvx7pEL4muqE0emXXb//N0rCi+QfjtHpFsgauTvG/fF8yFhVUrsrDGzomrsmiVo oSdggAWWkOmp1mdgAjPPNum/bTh+im5NfngbWY0a+M+mibqTF3tRPxE8x/zzjD5Yetv9 3fm68dDFhGVW08TiOEqDqeb21we4pBByjKXP830HK+K604KhzlxlsMWS9zJmX5/Aufoq xh+Mq2Uh+GoVvCd3+YB+KFDpza9jQ/zyp3bKwdRogB0hJGf8meCWp8/JgcU08aUkSKu2 swfBFRmaAJsbCZnpFSK3EMvPzS8kkVTqDDtMpX6ebowh91YHrC/mTjP+8Hw4h8edIs1+ o63A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717487027; x=1718091827; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gwRctPk8YA2xtoZZfVBydgMwAPoyp7QmT6ClvB00PPw=; b=fbtLtSpcICrQPe10hxmLSEHyuTvaUzLqqc8tB9mN7qnFIhdMhqjq9zFhTS3j3chART XgIBZhV+Lpftu+N6oLUUm/F4qT7Hv8+oMQAFUpoV0WeGm42eQqnYjyb1AX/fwoRlLhyS hv1b5gA3EOJNN1ImYQfyRyyLnB7TaECg5/4Nsu0Z2adUVWR9DTqA7bOrGT5lPvPfeBIU SAx0cy0AxXc+aKjoCJcKVlk5bn9zWuXZxzdowesBT9UTaAJaC1x9R/HcW207zM3J9ZIo 0B2WBeLZXiLSCeLI+hC/Y1bCzB2Y4Bqtyyp4zd1sK9Razr9WYSXN+HIad3EmOCA3yzxx HViw== X-Forwarded-Encrypted: i=1; AJvYcCX3CqATTx4SrkiAXJnK1tqb5QIP+rMTYEBkScPVpeuohJ2q0U1tk3a3U2wXstq/Onegg0jW513VBhZ+ahxLk6QKhH9bRy3YCA== X-Gm-Message-State: AOJu0Yz/QZkmBo0A9sPGRlpgrhMP+awF9I/t2gQWgL/wqhNXRFXPw8Sv +K47dkTTVoAFaEm/b0c3yjjg4m7M41lnQC0NehLE2E8NKkbXBRTnpFkDh5RVtyzhC+c9vCvoSGw 6/M0aUhLBx0Fjjc8XiVvT2INADbk= X-Google-Smtp-Source: AGHT+IH1EyJs8wGWZJOuEm/S4RJ4mxdbzzg/9/70s56JM5OGvDYq+5y/Vksz89ELCn01KGya3OxA/WNEpAzeV7pnWjE= X-Received: by 2002:a19:384d:0:b0:529:c0c6:faad with SMTP id 2adb3069b0e04-52b89596984mr7866979e87.28.1717487026662; Tue, 04 Jun 2024 00:43:46 -0700 (PDT) MIME-Version: 1.0 References: <20240513194830.1676938-1-qing.zhao@oracle.com> <35799652-55op-79os-ro61-56094939s0o2@fhfr.qr> <8c3c2c188421e4817249a5bc6d51dcf43eb079db.camel@redhat.com> <96A847FA-C41E-4BE3-BECB-83935E0D8415@oracle.com> <552015E4-10FF-4E20-BC2F-8D88A6C55C0B@oracle.com> <6D93ECB0-17CF-4BF8-8644-83EC1955DCEC@oracle.com> <978fa12b94451d07875b9bc524ef0ef26ac641ec.camel@redhat.com> In-Reply-To: <978fa12b94451d07875b9bc524ef0ef26ac641ec.camel@redhat.com> From: Richard Biener Date: Tue, 4 Jun 2024 09:43:35 +0200 Message-ID: Subject: Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading To: David Malcolm Cc: Qing Zhao , GCC Patches , Andrew Pinski , kees Cook Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Jun 3, 2024 at 4:48=E2=80=AFPM David Malcolm = wrote: > > On Mon, 2024-06-03 at 08:29 +0200, Richard Biener wrote: > > On Fri, May 31, 2024 at 11:23=E2=80=AFPM Qing Zhao > > wrote: > > > > > > > > > > > > > On May 23, 2024, at 07:46, Richard Biener > > > > wrote: > > > > > > > > On Wed, May 22, 2024 at 8:53=E2=80=AFPM Qing Zhao > > > > wrote: > > > > > > > > > > > > > > > > > > > > > On May 22, 2024, at 03:38, Richard Biener > > > > > > wrote: > > > > > > > > > > > > On Tue, May 21, 2024 at 11:36=E2=80=AFPM David Malcolm > > > > > > wrote: > > > > > > > > > > > > > > On Tue, 2024-05-21 at 15:13 +0000, Qing Zhao wrote: > > > > > > > > Thanks for the comments and suggestions. > > > > > > > > > > > > > > > > > On May 15, 2024, at 10:00, David Malcolm > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Tue, 2024-05-14 at 15:08 +0200, Richard Biener > > > > > > > > > wrote: > > > > > > > > > > On Mon, 13 May 2024, Qing Zhao wrote: > > > > > > > > > > > > > > > > > > > > > -Warray-bounds is an important option to enable > > > > > > > > > > > linux kernal to > > > > > > > > > > > keep > > > > > > > > > > > the array out-of-bound errors out of the source > > > > > > > > > > > tree. > > > > > > > > > > > > > > > > > > > > > > However, due to the false positive warnings > > > > > > > > > > > reported in > > > > > > > > > > > PR109071 > > > > > > > > > > > (-Warray-bounds false positive warnings due to code > > > > > > > > > > > duplication > > > > > > > > > > > from > > > > > > > > > > > jump threading), -Warray-bounds=3D1 cannot be added > > > > > > > > > > > on by > > > > > > > > > > > default. > > > > > > > > > > > > > > > > > > > > > > Although it's impossible to elinimate all the false > > > > > > > > > > > positive > > > > > > > > > > > warnings > > > > > > > > > > > from -Warray-bounds=3D1 (See PR104355 Misleading - > > > > > > > > > > > Warray-bounds > > > > > > > > > > > documentation says "always out of bounds"), we > > > > > > > > > > > should minimize > > > > > > > > > > > the > > > > > > > > > > > false positive warnings in -Warray-bounds=3D1. > > > > > > > > > > > > > > > > > > > > > > The root reason for the false positive warnings > > > > > > > > > > > reported in > > > > > > > > > > > PR109071 is: > > > > > > > > > > > > > > > > > > > > > > When the thread jump optimization tries to reduce > > > > > > > > > > > the # of > > > > > > > > > > > branches > > > > > > > > > > > inside the routine, sometimes it needs to duplicate > > > > > > > > > > > the code > > > > > > > > > > > and > > > > > > > > > > > split into two conditional pathes. for example: > > > > > > > > > > > > > > > > > > > > > > The original code: > > > > > > > > > > > > > > > > > > > > > > void sparx5_set (int * ptr, struct nums * sg, int > > > > > > > > > > > index) > > > > > > > > > > > { > > > > > > > > > > > if (index >=3D 4) > > > > > > > > > > > warn (); > > > > > > > > > > > *ptr =3D 0; > > > > > > > > > > > *val =3D sg->vals[index]; > > > > > > > > > > > if (index >=3D 4) > > > > > > > > > > > warn (); > > > > > > > > > > > *ptr =3D *val; > > > > > > > > > > > > > > > > > > > > > > return; > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > With the thread jump, the above becomes: > > > > > > > > > > > > > > > > > > > > > > void sparx5_set (int * ptr, struct nums * sg, int > > > > > > > > > > > index) > > > > > > > > > > > { > > > > > > > > > > > if (index >=3D 4) > > > > > > > > > > > { > > > > > > > > > > > warn (); > > > > > > > > > > > *ptr =3D 0; // Code duplications since > > > > > > > > > > > "warn" does > > > > > > > > > > > return; > > > > > > > > > > > *val =3D sg->vals[index]; // same this line. > > > > > > > > > > > // In this path, > > > > > > > > > > > since it's > > > > > > > > > > > under > > > > > > > > > > > the condition > > > > > > > > > > > // "index >=3D 4", the > > > > > > > > > > > compiler > > > > > > > > > > > knows > > > > > > > > > > > the value > > > > > > > > > > > // of "index" is > > > > > > > > > > > larger then 4, > > > > > > > > > > > therefore the > > > > > > > > > > > // out-of-bound > > > > > > > > > > > warning. > > > > > > > > > > > warn (); > > > > > > > > > > > } > > > > > > > > > > > else > > > > > > > > > > > { > > > > > > > > > > > *ptr =3D 0; > > > > > > > > > > > *val =3D sg->vals[index]; > > > > > > > > > > > } > > > > > > > > > > > *ptr =3D *val; > > > > > > > > > > > return; > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > We can see, after the thread jump optimization, the > > > > > > > > > > > # of > > > > > > > > > > > branches > > > > > > > > > > > inside > > > > > > > > > > > the routine "sparx5_set" is reduced from 2 to 1, > > > > > > > > > > > however, due > > > > > > > > > > > to > > > > > > > > > > > the > > > > > > > > > > > code duplication (which is needed for the > > > > > > > > > > > correctness of the > > > > > > > > > > > code), > > > > > > > > > > > we > > > > > > > > > > > got a false positive out-of-bound warning. > > > > > > > > > > > > > > > > > > > > > > In order to eliminate such false positive out-of- > > > > > > > > > > > bound warning, > > > > > > > > > > > > > > > > > > > > > > A. Add one more flag for GIMPLE: is_splitted. > > > > > > > > > > > B. During the thread jump optimization, when the > > > > > > > > > > > basic blocks > > > > > > > > > > > are > > > > > > > > > > > duplicated, mark all the STMTs inside the original > > > > > > > > > > > and > > > > > > > > > > > duplicated > > > > > > > > > > > basic blocks as "is_splitted"; > > > > > > > > > > > C. Inside the array bound checker, add the > > > > > > > > > > > following new > > > > > > > > > > > heuristic: > > > > > > > > > > > > > > > > > > > > > > If > > > > > > > > > > > 1. the stmt is duplicated and splitted into two > > > > > > > > > > > conditional > > > > > > > > > > > paths; > > > > > > > > > > > + 2. the warning level < 2; > > > > > > > > > > > + 3. the current block is not dominating the exit > > > > > > > > > > > block > > > > > > > > > > > Then not report the warning. > > > > > > > > > > > > > > > > > > > > > > The false positive warnings are moved from -Warray- > > > > > > > > > > > bounds=3D1 to > > > > > > > > > > > -Warray-bounds=3D2 now. > > > > > > > > > > > > > > > > > > > > > > Bootstrapped and regression tested on both x86 and > > > > > > > > > > > aarch64. > > > > > > > > > > > adjusted > > > > > > > > > > > -Warray-bounds-61.c due to the false positive > > > > > > > > > > > warnings. > > > > > > > > > > > > > > > > > > > > > > Let me know if you have any comments and > > > > > > > > > > > suggestions. > > > > > > > > > > > > > > > > > > > > At the last Cauldron I talked with David Malcolm > > > > > > > > > > about these kind > > > > > > > > > > of > > > > > > > > > > issues and thought of instead of suppressing > > > > > > > > > > diagnostics to > > > > > > > > > > record > > > > > > > > > > how a block was duplicated. For jump threading my > > > > > > > > > > idea was to > > > > > > > > > > record > > > > > > > > > > the condition that was proved true when entering the > > > > > > > > > > path and do > > > > > > > > > > this > > > > > > > > > > by recording the corresponding locations > > > > > > > > > > > > > > > > Is only recording the location for the TRUE path enough? > > > > > > > > We might need to record the corresponding locations for > > > > > > > > both TRUE and > > > > > > > > FALSE paths since the VRP might be more accurate on both > > > > > > > > paths. > > > > > > > > Is only recording the location is enough? > > > > > > > > Do we need to record the pointer to the original > > > > > > > > condition stmt? > > > > > > > > > > > > > > Just to be clear: I don't plan to work on this myself (I > > > > > > > have far too > > > > > > > much already to work on...); I'm assuming Richard Biener is > > > > > > > hoping/planning to implement this himself. > > > > > > > > > > > > While I think some of this might be an improvement to those > > > > > > vast > > > > > > number of "false positive" diagnostics we have from (too) > > > > > > late diagnostic > > > > > > passes I do not have the cycles to work on this. > > > > > > > > > > I can study a little bit more on how to improve the diagnostics > > > > > for PR 109071 first. > > > > > > > > > > FYI, I just updated PR109071=E2=80=99s description as: Need more > > > > > context for -Warray-bounds warnings due to code duplication > > > > > from jump threading. > > > > > > > > > > As we already agreed, this is NOT a false positive. It caught a > > > > > real bug in linux kernel that need to be patched in the kernel > > > > > source. (See > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109071#c11) > > > > > > > > > > In order to add more context to the diagnostics to help the end > > > > > user locate the bug accurately in their source code, compiler > > > > > needs: > > > > > > > > > > 1. During jump threading phase, record the following > > > > > information for each duplicated STMT: > > > > > A. A pointer to the COND stmt; > > > > > B. True/false for such COND > > > > > 2. During array out-of-bound checking phase, whenever locate an > > > > > out-of-bound case, > > > > > A. Use a rich_location for the diagnostic; > > > > > B. Create an instance of diagnositic_path, add events to > > > > > this diagnostic_path based on the COND stmt, True/False info > > > > > recorded in the STMT; > > > > > C. Call richloc.set_path() to associate the path with > > > > > the rich_location; > > > > > D. Then issue warning with this rich_location instead of > > > > > the current regular location. > > > > > > > > > > Any comment and suggestion to the above? > > > > > > > > I was originally thinking of using location_adhoc_data to store > > > > 1.A > > > > and 1.B as a common bit to associate to each > > > > copied stmt. IIRC location_adhoc_data->data is the stmts > > > > associated > > > > lexical block so we'd need to stuff another > > > > 'data' field into this struct, like a "copy history" where we > > > > could > > > > then even record copied-of-copy-of-X. > > > > locataion_adhoc_data seems imperfectly packed right now, with a > > > > 32bit > > > > hole before 'data' and 32bit unused > > > > at its end, so we might get away without memory use by re- > > > > ordering > > > > discriminator before 'data' ... > > > > > > Like this? > > > > > > diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h > > > index e6e2b0897572..ee344f91333b 100644 > > > --- a/libcpp/include/line-map.h > > > +++ b/libcpp/include/line-map.h > > > @@ -761,8 +761,9 @@ struct GTY(()) maps_info_macro { > > > struct GTY(()) location_adhoc_data { > > > location_t locus; > > > source_range src_range; > > > - void * GTY((skip)) data; > > > unsigned discriminator; > > > + void * GTY((skip)) data; > > > + void * GTY((skip)) copy_history; > > > }; > > > struct htab; > > > > Yes. > > > > > How about the copy_history? Do we need a new data structure (like > > > the following, or other suggestion) for this? Where should I add > > > this new data structure? > > > > As it needs to be managed by libcpp it should be in this very same > > file. > > > > > struct copy_history { > > > location_t condition; > > > Bool is_true_path; > > > } > > > > I think we want a pointer to the previous copy-of state as well in > > case a stmt > > was copied twice. We'll see whether a single (condition) location > > plus edge flag > > is sufficient. I'd say we should plan for an enum to indicate the > > duplication > > reason at least (jump threading, unswitching, unrolling come to my > > mind). For > > jump threading being able to say "when is true/false" is > > probably > > good enough, though it might not always be easy to identify a single > > condition > > here given a threading path starts at an incoming edge to a CFG merge > > and > > will usually end with the outgoing edge of a condition that we can > > statically > > evaluate. The condition controlling the path entry might not be > > determined > > fully by a single condition location. > > > > Possibly building a full "diagnostic path" object at threading time > > might be > > the only way to record all the facts, but that's also going to be > > more > > expensive. > > Note that a diagnostic_path represents a path through some kind of > graph, whereas it sounds like you want to be storing the *nodes* in the > graph, and later generating the diagnostic_path from that graph when we > need it (which is trivial if the graph is actually just a tree: just > follow the parent links backwards, then reverse it). I think we are mixing two things - one is that a single transform like jump threading produces a stmt copy and when we emit a diagnostic on that copied statement we want to tell the user the condition under which the copy is executed. That "condition" can be actually a sequence of conditionals. I wanted to point out that a diagnostic_path instance could be used to describe such complex condition. But then the other thing I wanted to address with the link to a previous copy_history - that's when a statement gets copied twice, for example by two distinct jump threading optimizations. Like when dumping the inlining decisions for diagnostics we could dump the logical "and" of the conditions of the two threadings. Since we have a single location per GIMPLE stmt we'd have to keep a "stack" of copy events associated with it. That's the linked list (I think a linked list should work fine here). I realize things may get a bit "fat", but eventually we are not duplicating statements that much. I do hope we can share for example a big diagnostic_path when we duplicate a basic block during threading and use one instance for all stmts in such block copy (IIRC we never release locations or their ad-hoc data, we just make sure to never use locations with ad-hoc data pointing to BLOCKs that we released, but the linemap data will still have pointers in "dead" location entries, right?) Richard. > > Dave > > > I can think of having a -fdiagnostic-try-to-explain-harder option to > > clarify confusing > > diagnostics people could use on-demand? > > > > Richard. > > > > > Qing > > > > > > > > Richard. > > > > > > > > > Thanks. > > > > > > > > > > Qing > > > > > > > > > > > > > > > > > > > > > > Richard. > > > > > > > > > > > > > But feel free to ask me any questions about the > > > > > > > diagnostic_path > > > > > > > machinery within the diagnostics subsystem. > > > > > > > > > > > > > > [...snip...] > > > > > > > > > > > > > > Regards > > > > > > > Dave > > > > > > > > >