From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <amacleod@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.133.124])
 by sourceware.org (Postfix) with ESMTPS id 8FD273858D28
 for <gcc@gcc.gnu.org>; Wed, 15 Dec 2021 19:23:30 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8FD273858D28
Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com
 [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-322-4k3Gpb0FPEaFE_o9tn5oGA-1; Wed, 15 Dec 2021 14:23:28 -0500
X-MC-Unique: 4k3Gpb0FPEaFE_o9tn5oGA-1
Received: by mail-qt1-f198.google.com with SMTP id
 v17-20020a05622a131100b002aea167e24aso31011264qtk.5
 for <gcc@gcc.gnu.org>; Wed, 15 Dec 2021 11:23:28 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:message-id:date:mime-version:user-agent:subject
 :content-language:to:references:from:in-reply-to
 :content-transfer-encoding;
 bh=t0VM0p9v5955FYHsOTmlv7vPJfmwO2ko29evDeth/kU=;
 b=jpQ5QpLwIq9M6JKUtjPJzX4XifSfNV7Uw7lW5LFfNQh0r7pfvgEN1lhi0ktuaKsIOo
 IiAEvoMzIFjDOuFC6jDCzjsg/iGW6yXFNsSUATqJs34PDnfpmhFvnpX5SOI+nBkMb5eG
 AgzFDhWiqWDDGXGJb99GNwtUtbWogGR3XRIYaIlNJd4bgUpPaj3m7924XNQCnozL0BqO
 dsMreXiXMI0ouz0EAi+Mf9XFjB/NAWGMLb5zIXV9Y6BN+57u+pys5mJE25+OE+Sn+4Zn
 LVBxdoiVJLVP5eciaq9uQ4WyvmGr/vGBIKdManc/WYcQjrKVqMqTouDHgMb08fMVVVBP
 a9eA==
X-Gm-Message-State: AOAM5338F3sTNjiuvg6Pg6IrTC/h1kWc6CUfKH4q+GZyfrEeqRYW97lw
 XwhKt8244/oryHZSjPUsK2GPyyMfj6clqwJpfHb+mFioybAm5IePdvY6O3Sca6TY40eU326xEQX
 Fe7AAF/8=
X-Received: by 2002:a37:a144:: with SMTP id k65mr9962824qke.52.1639596207597; 
 Wed, 15 Dec 2021 11:23:27 -0800 (PST)
X-Google-Smtp-Source: ABdhPJw0z5hlJB8kXxVd1OF/QDcJLfpHRHAZAsNesRXIby8y52B5WnlUV2TQWYQ3Gytn1eAv/lHKWQ==
X-Received: by 2002:a37:a144:: with SMTP id k65mr9962804qke.52.1639596207204; 
 Wed, 15 Dec 2021 11:23:27 -0800 (PST)
Received: from ?IPV6:2607:fea8:a262:5f00::a0bd? ([2607:fea8:a262:5f00::a0bd])
 by smtp.gmail.com with ESMTPSA id
 g19sm2073868qtg.82.2021.12.15.11.23.26
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Wed, 15 Dec 2021 11:23:26 -0800 (PST)
Message-ID: <25251cd0-494a-ab8e-0aa7-2bacc2d8e25a@redhat.com>
Date: Wed, 15 Dec 2021 14:23:25 -0500
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.3.2
Subject: Re: getting branch conditions using ranger
To: Martin Sebor <msebor@gmail.com>, gcc mailing list <gcc@gcc.gnu.org>
References: <c0f4f588-c219-616c-2fe4-3f0fe10125a9@gmail.com>
From: Andrew MacLeod <amacleod@redhat.com>
In-Reply-To: <c0f4f588-c219-616c-2fe4-3f0fe10125a9@gmail.com>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Language: en-CA
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00, BODY_8BITS,
 DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF,
 NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SCC_5_SHORT_WORD_LINES,
 SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Dec 2021 19:23:32 -0000

On 12/14/21 18:55, Martin Sebor wrote:
> Andrew, to improve the context of the late warnings I'm trying
> to see how to get the execution path(s) leading from function
> entry up to a statement.  For example, for the code below I'd
> like to "collect" and show the three conditionals in the context
> of the warning:
>
> extern char a[9];
>
> void f (int m, int n, void *s)
> {
>   if (m < 3) m = 3;
>   if (n < 4) n = 4;
>   if (s != 0)
>     {
>       char *d = a + 3;
>       __builtin_memcpy (d, s, m + n);
>     }
> }
>
> At a minimum, I'd like to print a note after the warning:
>
>   warning: ‘__builtin_memcpy’ writing between 7 and 2147483647 bytes 
> into a region of size 6 overflows the destination [-Wstringop-overflow=]
>
>   note: when 'm >= 3 && n >= 4 && s != 0'
>
> (The final version would point to each conditional in the source
> like the static analyzer does.)


Finding the exact location of the conditional may not be possible.. the 
condition could be the aggregation of multiple statements.  The ranges 
of m and n are set via 2 statements, the condition and the assignment, 
and then collected by a 3rd, the PHI node.

The "note:" itself could simply be created from the ranges..  the range 
at the memcpy of m,  n and s are:

m_3  : int [3, +INF]
n_4  : int [4, +INF]
s_10(D) :     void * [1B, +INF]

which could be easily translated to 'm >= 3 && n >= 4 && s != 0', 
assuming you can figure out the original source name.. In this case 
happens to be the VAR on the ssa-name, but you can't always count on 
that.  I'd also point out there is no statement which has m >=3 or n>= 
4, but rather a series of statements that produce that result.


>
> For conditions that involve ranges used in the statements (i.e.,
> the first two conditions in the source above), I wonder if rather
> than traversing the CFG in a separate step, I might be able to
> use Ranger to collect the conditions at the time it populates its
> cache (i.e., when I call it to get range info for each statement).
> I imagine I would need to derive a new class from gimple_ranger
> and override some virtual member functions.  (And maybe also do
> the same for ranger_cache?)
>
> Does this sound like something I should be able to do within
> the framework?  If yes, do you have any tips or suggestions for
> where/how to start?
>
That is not really going to work. The cache is agnostic as to why there 
are ranges or what caused them.  It is primarily a propagation engine 
which utilizes the edge ranges provided by GORI and range-ops.  Ranger 
itself is the controller for this property propagation.

GORI does not treat any statement as special.  It is an equation solver 
which uses range-ops to algebraically solve for an unknown value when 
the other values are known.  The "if" condition at the exit of a basic 
block is considered no different than an assignment statement where the 
LHS is either [0,0] or [1,1] depending on the edge you are looking at. 
    So to use your example above for just m:

     <bb 2> :
     if (m_6(D) <= 2)
       goto <bb 3>; [INV]       //  [1,1] = m_6 <= 2
     else
       goto <bb 4>; [INV]      //    [0,0] = m_6 <= 2

2->3  (T) m_6(D) :      int [-INF, 2]
2->4  (F) m_6(D) :      int [3, +INF]

=========== BB 3 ============
     <bb 3> :

=========== BB 4 ============
Imports: n_8(D)
Exports: n_8(D)
n_8(D)  int VARYING
     <bb 4> :
     # m_3 = PHI <m_6(D)(2), 3(3)>    //  m_3 = union (m_6 (2->4), 
[3,3]) = union ([3, +INF],[3,3]) = [3, +INF]

m_3 : int [3, +INF]


We generate 3 ranges for m..  m_6 on the 1)edge 2->3, 2)edge 2->4 and 
3)combine them with the constant 3 to produce m_3 at the PHI.

This all happens at a fairly low level, and you wont be able to overload 
anything to be able to log it in a sensible fashion. And I'm not sure 
how you would go about logging the arbitrarily long series of actions 
that result in the range you are looking at, and then utilize it.  GORI 
could tell you that m_6 and m_3 are used in the range calculation, but 
then the IL tells you that too.


> Thanks
> Martin
>
> PS I'm assuming -O0 for the above test case where the m + n
> expression is a sum of two PHIs.  With -O1 and higher some of
> the integer conditionals end up transformed into MAX_EXPRs so
> it will likely need to be handled differently or I may not be
> able to capture all the conditions reliably.  I don't know how
> much that might compromise the result.
>
Within ranger is is irrelevant whether its from a MAX or a condition.

   <bb 3> [local count: 574129753]:
   _5 = MAX_EXPR <n_6(D), 4>;
   _7 = MAX_EXPR <m_4(D), 3>;
   _1 = _5 + _7;

_7 generates a range of [3, +INF] based on range-ops knowledge of how 
MAX works.

In both cases, for m_3 above and _7, the range of [3, +INF] is generated 
at the definition point... One is a PHI which includes some CFG 
propagation to retrieve the argument values and perform the calculation, 
and  _7 which is much simpler.. its just the global characteristic of 
the statement calculated by range-ops where the LHS is the unknown in 
the equation.  Ultimately, both ranges are created at the definition 
point of the ssa_name  _7 or m_3.

The question is what precisely do you want to do with the IL  and what 
do you want to point at?  Optimization makes mapping things back to the 
original IL problematic to be sure, and as you can see with the _7, even 
the symbolic name can become lost or changed.

GORI can be utilized to tell you whether a range is generated on an edge 
for any specific name, but it depends on what you want to do with it.  
There is currently no facility to ask which particular statement caused 
the range to be created, mostly because there hasn't been a need, and 
also because it is usually the result of a string of operations.   Given:

a = m + 10
if (m > 10 && m < 100)
     foo (a)

we see:

     <bb 2> :
     a_5 = m_4(D) + 10;
     m.0_1 = (unsigned int) m_4(D);
     _2 = m.0_1 + 4294967285;
     if (_2 <= 88)
       goto <bb 3>; [INV]
     else
       goto <bb 4>; [INV]

a_5 : int [-2147483638, +INF]
2->3  (T) m.0_1 :       unsigned int [11, 99]
2->3  (T) m_4(D) :      int [11, 99]
2->3  (T) a_5 :         int [21, 109]
2->4  (F) m.0_1 :       unsigned int [0, 10][100, +INF]
2->4  (F) m_4(D) :      int [-INF, 10][100, +INF]
2->4  (F) a_5 :         int [-2147483638, 20][110, +INF]

=========== BB 3 ============
a_5     int [21, 109]

We produce a range of [21, 109] for a_5,  but which statement would we 
point to as the "defining" statement for that range?  (_2 <= 88) isn't 
very useful.  a_5 is never used in a condition, but based on the reduced 
range of m_4 we can recalculate a_5 on that path to produce that 
value... but there is no good single thing to point to.

The condition can be arbitrarily complex as well:

if we change the condition to:

   if ((m > 10 && m < 100) || m == 200)
     foo (a);

     a_8 = m_7(D) + 10;
     m.0_1 = (unsigned int) m_7(D);
     _2 = m.0_1 + 4294967285;
     _3 = _2 <= 88;
     _4 = m_7(D) == 200;
     _5 = _3 | _4;
     if (_5 != 0)
       goto <bb 3>; [INV]
     else
       goto <bb 4>; [INV]

a_8 : int [-2147483638, +INF]
2->3  (T) m.0_1 :       unsigned int [11, 99][200, 200]
2->3  (T) m_7(D) :      int [11, 99][200, 200]
2->3  (T) a_8 :         int [21, 109][210, 210]
2->4  (F) m.0_1 :       unsigned int [0, 10][100, 199][201, +INF]
2->4  (F) m_7(D) :      int [-INF, 10][100, 199][201, +INF]
2->4  (F) a_8 :         int [-2147483638, 20][110, 209][211, +INF]

=========== BB 3 ============
a_8     int [21, 109][210, 210]

Now the condition that causes this range is a string of expressions that 
all combine to produce the range, again, none of which actually utilize 
a_8.   GORI uses range-ops to solve the equation from the bottom of the 
block upwards until it has a result for m_7,  and then recalculates a_8 
based on that.

We could produce a list of the ssa_names which are evaluated on each 
edge to produce a range.  In the latter case, the range of a_8 is 
produced by evaluating _5, _3, _4, m_7, _2, m.0_1, m_7, and finally a_8.


So I think the long answer to your question is no, ranger probably isn't 
going to provide a quick pointer to the information you are looking for. 
    However, its not clear to me precisely what you are looking for.   
GORI can provide lots of information about whether ranges are created or 
not from each basic block, and there is a fair amount of dependency 
information available about which names go into a calucaltion.  Its also 
good at seeing thru different combinations of expressions. The engine 
could also be repurposed, but to delve into the details of each 
low-level range operation in such a way to produce useful high-level 
information isn't clear to me how that would work.

You could reverse engineer an expression from the resulting range, and 
then point to the source statements from the aggregate of the line 
number info over all ssa_names definitions used in the calculation by 
GORI across the various blocks in the dominators, but I can see how that 
could quickly become verbose. Probably not that useful unless is was 
limited and some mechanism for combining source lines from different 
blocks in a meaningful way.

I'm not familiar with how much useful info the line-info has, nor how 
much "noise" this would produce.  I would expect this to rapidly 
degenerate and become confusing with missing information as the control 
flow or optimizations increase.  Perhaps it could be limited to just 
simple cases, if they could be easily identified.

Andrew