From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 8FD273858D28 for ; Wed, 15 Dec 2021 19:23:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8FD273858D28 Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-322-4k3Gpb0FPEaFE_o9tn5oGA-1; Wed, 15 Dec 2021 14:23:28 -0500 X-MC-Unique: 4k3Gpb0FPEaFE_o9tn5oGA-1 Received: by mail-qt1-f198.google.com with SMTP id v17-20020a05622a131100b002aea167e24aso31011264qtk.5 for ; Wed, 15 Dec 2021 11:23:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=t0VM0p9v5955FYHsOTmlv7vPJfmwO2ko29evDeth/kU=; b=jpQ5QpLwIq9M6JKUtjPJzX4XifSfNV7Uw7lW5LFfNQh0r7pfvgEN1lhi0ktuaKsIOo IiAEvoMzIFjDOuFC6jDCzjsg/iGW6yXFNsSUATqJs34PDnfpmhFvnpX5SOI+nBkMb5eG AgzFDhWiqWDDGXGJb99GNwtUtbWogGR3XRIYaIlNJd4bgUpPaj3m7924XNQCnozL0BqO dsMreXiXMI0ouz0EAi+Mf9XFjB/NAWGMLb5zIXV9Y6BN+57u+pys5mJE25+OE+Sn+4Zn LVBxdoiVJLVP5eciaq9uQ4WyvmGr/vGBIKdManc/WYcQjrKVqMqTouDHgMb08fMVVVBP a9eA== X-Gm-Message-State: AOAM5338F3sTNjiuvg6Pg6IrTC/h1kWc6CUfKH4q+GZyfrEeqRYW97lw XwhKt8244/oryHZSjPUsK2GPyyMfj6clqwJpfHb+mFioybAm5IePdvY6O3Sca6TY40eU326xEQX Fe7AAF/8= X-Received: by 2002:a37:a144:: with SMTP id k65mr9962824qke.52.1639596207597; Wed, 15 Dec 2021 11:23:27 -0800 (PST) X-Google-Smtp-Source: ABdhPJw0z5hlJB8kXxVd1OF/QDcJLfpHRHAZAsNesRXIby8y52B5WnlUV2TQWYQ3Gytn1eAv/lHKWQ== X-Received: by 2002:a37:a144:: with SMTP id k65mr9962804qke.52.1639596207204; Wed, 15 Dec 2021 11:23:27 -0800 (PST) Received: from ?IPV6:2607:fea8:a262:5f00::a0bd? ([2607:fea8:a262:5f00::a0bd]) by smtp.gmail.com with ESMTPSA id g19sm2073868qtg.82.2021.12.15.11.23.26 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 15 Dec 2021 11:23:26 -0800 (PST) Message-ID: <25251cd0-494a-ab8e-0aa7-2bacc2d8e25a@redhat.com> Date: Wed, 15 Dec 2021 14:23:25 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.3.2 Subject: Re: getting branch conditions using ranger To: Martin Sebor , gcc mailing list References: From: Andrew MacLeod In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-CA Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00, BODY_8BITS, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Dec 2021 19:23:32 -0000 On 12/14/21 18:55, Martin Sebor wrote: > Andrew, to improve the context of the late warnings I'm trying > to see how to get the execution path(s) leading from function > entry up to a statement.  For example, for the code below I'd > like to "collect" and show the three conditionals in the context > of the warning: > > extern char a[9]; > > void f (int m, int n, void *s) > { >   if (m < 3) m = 3; >   if (n < 4) n = 4; >   if (s != 0) >     { >       char *d = a + 3; >       __builtin_memcpy (d, s, m + n); >     } > } > > At a minimum, I'd like to print a note after the warning: > >   warning: ‘__builtin_memcpy’ writing between 7 and 2147483647 bytes > into a region of size 6 overflows the destination [-Wstringop-overflow=] > >   note: when 'm >= 3 && n >= 4 && s != 0' > > (The final version would point to each conditional in the source > like the static analyzer does.) Finding the exact location of the conditional may not be possible.. the condition could be the aggregation of multiple statements.  The ranges of m and n are set via 2 statements, the condition and the assignment, and then collected by a 3rd, the PHI node. The "note:" itself could simply be created from the ranges..  the range at the memcpy of m,  n and s are: m_3  : int [3, +INF] n_4  : int [4, +INF] s_10(D) :     void * [1B, +INF] which could be easily translated to 'm >= 3 && n >= 4 && s != 0', assuming you can figure out the original source name.. In this case happens to be the VAR on the ssa-name, but you can't always count on that.  I'd also point out there is no statement which has m >=3 or n>= 4, but rather a series of statements that produce that result. > > For conditions that involve ranges used in the statements (i.e., > the first two conditions in the source above), I wonder if rather > than traversing the CFG in a separate step, I might be able to > use Ranger to collect the conditions at the time it populates its > cache (i.e., when I call it to get range info for each statement). > I imagine I would need to derive a new class from gimple_ranger > and override some virtual member functions.  (And maybe also do > the same for ranger_cache?) > > Does this sound like something I should be able to do within > the framework?  If yes, do you have any tips or suggestions for > where/how to start? > That is not really going to work. The cache is agnostic as to why there are ranges or what caused them.  It is primarily a propagation engine which utilizes the edge ranges provided by GORI and range-ops.  Ranger itself is the controller for this property propagation. GORI does not treat any statement as special.  It is an equation solver which uses range-ops to algebraically solve for an unknown value when the other values are known.  The "if" condition at the exit of a basic block is considered no different than an assignment statement where the LHS is either [0,0] or [1,1] depending on the edge you are looking at.    So to use your example above for just m:     :     if (m_6(D) <= 2)       goto ; [INV]       //  [1,1] = m_6 <= 2     else       goto ; [INV]      //    [0,0] = m_6 <= 2 2->3  (T) m_6(D) :      int [-INF, 2] 2->4  (F) m_6(D) :      int [3, +INF] =========== BB 3 ============     : =========== BB 4 ============ Imports: n_8(D) Exports: n_8(D) n_8(D)  int VARYING     :     # m_3 = PHI     //  m_3 = union (m_6 (2->4), [3,3]) = union ([3, +INF],[3,3]) = [3, +INF] m_3 : int [3, +INF] We generate 3 ranges for m..  m_6 on the 1)edge 2->3, 2)edge 2->4 and 3)combine them with the constant 3 to produce m_3 at the PHI. This all happens at a fairly low level, and you wont be able to overload anything to be able to log it in a sensible fashion. And I'm not sure how you would go about logging the arbitrarily long series of actions that result in the range you are looking at, and then utilize it.  GORI could tell you that m_6 and m_3 are used in the range calculation, but then the IL tells you that too. > Thanks > Martin > > PS I'm assuming -O0 for the above test case where the m + n > expression is a sum of two PHIs.  With -O1 and higher some of > the integer conditionals end up transformed into MAX_EXPRs so > it will likely need to be handled differently or I may not be > able to capture all the conditions reliably.  I don't know how > much that might compromise the result. > Within ranger is is irrelevant whether its from a MAX or a condition.   [local count: 574129753]:   _5 = MAX_EXPR ;   _7 = MAX_EXPR ;   _1 = _5 + _7; _7 generates a range of [3, +INF] based on range-ops knowledge of how MAX works. In both cases, for m_3 above and _7, the range of [3, +INF] is generated at the definition point... One is a PHI which includes some CFG propagation to retrieve the argument values and perform the calculation, and  _7 which is much simpler.. its just the global characteristic of the statement calculated by range-ops where the LHS is the unknown in the equation.  Ultimately, both ranges are created at the definition point of the ssa_name  _7 or m_3. The question is what precisely do you want to do with the IL  and what do you want to point at?  Optimization makes mapping things back to the original IL problematic to be sure, and as you can see with the _7, even the symbolic name can become lost or changed. GORI can be utilized to tell you whether a range is generated on an edge for any specific name, but it depends on what you want to do with it.  There is currently no facility to ask which particular statement caused the range to be created, mostly because there hasn't been a need, and also because it is usually the result of a string of operations.   Given: a = m + 10 if (m > 10 && m < 100)     foo (a) we see:     :     a_5 = m_4(D) + 10;     m.0_1 = (unsigned int) m_4(D);     _2 = m.0_1 + 4294967285;     if (_2 <= 88)       goto ; [INV]     else       goto ; [INV] a_5 : int [-2147483638, +INF] 2->3  (T) m.0_1 :       unsigned int [11, 99] 2->3  (T) m_4(D) :      int [11, 99] 2->3  (T) a_5 :         int [21, 109] 2->4  (F) m.0_1 :       unsigned int [0, 10][100, +INF] 2->4  (F) m_4(D) :      int [-INF, 10][100, +INF] 2->4  (F) a_5 :         int [-2147483638, 20][110, +INF] =========== BB 3 ============ a_5     int [21, 109] We produce a range of [21, 109] for a_5,  but which statement would we point to as the "defining" statement for that range?  (_2 <= 88) isn't very useful.  a_5 is never used in a condition, but based on the reduced range of m_4 we can recalculate a_5 on that path to produce that value... but there is no good single thing to point to. The condition can be arbitrarily complex as well: if we change the condition to:   if ((m > 10 && m < 100) || m == 200)     foo (a);     a_8 = m_7(D) + 10;     m.0_1 = (unsigned int) m_7(D);     _2 = m.0_1 + 4294967285;     _3 = _2 <= 88;     _4 = m_7(D) == 200;     _5 = _3 | _4;     if (_5 != 0)       goto ; [INV]     else       goto ; [INV] a_8 : int [-2147483638, +INF] 2->3  (T) m.0_1 :       unsigned int [11, 99][200, 200] 2->3  (T) m_7(D) :      int [11, 99][200, 200] 2->3  (T) a_8 :         int [21, 109][210, 210] 2->4  (F) m.0_1 :       unsigned int [0, 10][100, 199][201, +INF] 2->4  (F) m_7(D) :      int [-INF, 10][100, 199][201, +INF] 2->4  (F) a_8 :         int [-2147483638, 20][110, 209][211, +INF] =========== BB 3 ============ a_8     int [21, 109][210, 210] Now the condition that causes this range is a string of expressions that all combine to produce the range, again, none of which actually utilize a_8.   GORI uses range-ops to solve the equation from the bottom of the block upwards until it has a result for m_7,  and then recalculates a_8 based on that. We could produce a list of the ssa_names which are evaluated on each edge to produce a range.  In the latter case, the range of a_8 is produced by evaluating _5, _3, _4, m_7, _2, m.0_1, m_7, and finally a_8. So I think the long answer to your question is no, ranger probably isn't going to provide a quick pointer to the information you are looking for.    However, its not clear to me precisely what you are looking for.   GORI can provide lots of information about whether ranges are created or not from each basic block, and there is a fair amount of dependency information available about which names go into a calucaltion.  Its also good at seeing thru different combinations of expressions. The engine could also be repurposed, but to delve into the details of each low-level range operation in such a way to produce useful high-level information isn't clear to me how that would work. You could reverse engineer an expression from the resulting range, and then point to the source statements from the aggregate of the line number info over all ssa_names definitions used in the calculation by GORI across the various blocks in the dominators, but I can see how that could quickly become verbose. Probably not that useful unless is was limited and some mechanism for combining source lines from different blocks in a meaningful way. I'm not familiar with how much useful info the line-info has, nor how much "noise" this would produce.  I would expect this to rapidly degenerate and become confusing with missing information as the control flow or optimizations increase.  Perhaps it could be limited to just simple cases, if they could be easily identified. Andrew