From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <sewardj42@gmail.com>
Received: from mail-vs1-xe2a.google.com (mail-vs1-xe2a.google.com
 [IPv6:2607:f8b0:4864:20::e2a])
 by sourceware.org (Postfix) with ESMTPS id 01C713858D20
 for <gcc@gcc.gnu.org>; Tue, 15 Feb 2022 13:01:00 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 01C713858D20
Received: by mail-vs1-xe2a.google.com with SMTP id y4so3294737vsd.11
 for <gcc@gcc.gnu.org>; Tue, 15 Feb 2022 05:00:59 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=8trFToVPreZYfAUSCbD8Rc4Jv7Ilbyx19pCq+c6tIF0=;
 b=AYAKlflKrsNr8Mqzul8BTOuYl+3VRLB+KW6ExSVtZU/NPY84EDQE7XT4f8CBWL7RO1
 jMWUHEQyxF8StTMOlJOX5pip6d+qTMjJmF1UyfTXqDMVmY1jnjqk0+v6iIuFJ6nEQN9W
 qxC/HvUuQ3JkYQFX/oEKA9NVo3fduf55Q6wCxmNsaY/fHPwavoaYd5cOQkvBDE0yI1bE
 mhwlOhpJ/35czaX9DsiaH6LwO3V2wt+2PlvtZFOzoVHlsvqEfYnQka1G7OziAsl8XamV
 BcXuBy22FU6O/A+F4Nbhht4chhOSHdAMMP4/A8jpi9mal8fy0JjA4WGVbpEVBDi7Rkit
 RnyA==
X-Gm-Message-State: AOAM531Vnqpa+5xBGQeXgUBtkGSUfipMencWBdmCqzxQ3JR6DGuORUQ0
 vSKAm/5L5/K4QO62NniMyjMgg40jIwykvmcnKW+NdXITyRNcew==
X-Google-Smtp-Source: ABdhPJz01zcHcC8faG1Lg2czGCj4PxqeHaAWtyOnpf18lJ03YlUWIDcy1krYNT0IimpuFnLGOJfNeOBC9gJvd7PGhpw=
X-Received: by 2002:a67:c008:: with SMTP id v8mr1260886vsi.71.1644930059389;
 Tue, 15 Feb 2022 05:00:59 -0800 (PST)
MIME-Version: 1.0
References: <20220214155757.861877-1-dmalcolm@redhat.com>
 <71de3204e639eed5052ca9e6416334aba6b2d1c7.camel@klomp.org>
 <ab8b3b762fcabc2827e3b2f82cff6a11c9cd2ee3.camel@redhat.com>
 <3bfbfbf02e2d17d45b4a91e5ea5f855e0a62e5f5.camel@klomp.org>
 <CAFiYyc2f5ZCf3_=ix1YYc5vRjZCikJMPAkComM35hnY3-DQJNA@mail.gmail.com>
 <a0b6fd4233d1cb5f52986a20e38d8aaf8276629b.camel@klomp.org>
In-Reply-To: <a0b6fd4233d1cb5f52986a20e38d8aaf8276629b.camel@klomp.org>
From: Julian Seward <sewardj42@gmail.com>
Date: Tue, 15 Feb 2022 14:00:47 +0100
Message-ID: <CAPQtpYuKWMgYJua7dLrCsph-wMLgLZDYjuoLuD-fVZA5r2Hm=Q@mail.gmail.com>
Subject: Re: Uninit warnings due to optimizing short-circuit conditionals
To: Mark Wielaard <mark@klomp.org>
Cc: Richard Biener <richard.guenther@gmail.com>,
 David Malcolm <dmalcolm@redhat.com>, GCC Development <gcc@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT,
 FREEMAIL_FROM, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Feb 2022 13:01:01 -0000

Sorry for the delayed response.  I've been paging this all back in.

I first saw this problem when memcheck-ing Firefox as compiled by Clang, some
years back.  Not long after GCC was also at it.  The transformation in
question is (at the C level):

A && B  ==>  B && A   if it can be proved that A
                      is always false whenever B is undefined
                      and (I assume) that B is provably exception-free

where && means the standard lazy left-first C logical-AND.  I believe this
might have become more prevalent due to ever-more aggressive inlining (at
least for Firefox), which presented the compilers with greater opportunities
to make the required proofs.

After wondering what to do about this viz-a-viz Memcheck, I realised after a
while that, because Memcheck does know how to do exact definedness propagation
through bitwise and/or, I could "fix" it by recovering the underlying &&
expression through analysis of local fragments of the control flow graph.  So
it now looks for

    if !A goto X
    if !B goto X
    <then-clause>
    X:

and generates IR for analysis as if it had seen
  "if (A bitwise-& B) <then-clause>".

Note that the bitwise vs logical-AND distinction isn't really correct; what
we're really talking about is non-lazy vs lazy AND.  The number of bits
involved in the representation is irrelevant.

A couple of other notes:

* Valgrind only deals with 2-argument &&s; that is all that seemed necessary.
  In principle though the transformation generalises to any number of terms
  &&-ed together, not just 2.

* For reasons I don't really remember now, I didn't need to deal with the
  equivalent OR case.  It's all the same at the machine code level.  (Waves
  hands and mumbles something about De Morgan ..)

I'm sure all the above info is in the slides of the Fosdem talk that Mark
mentioned.  I don't think the above contributes anything new.

On Tue, Feb 15, 2022 at 1:29 PM Mark Wielaard <mark@klomp.org> wrote:
>
> Hi Richard,
>
> On Tue, 2022-02-15 at 08:25 +0100, Richard Biener wrote:
> > On Mon, Feb 14, 2022 at 6:38 PM Mark Wielaard <mark@klomp.org> wrote:
> > > Yes. valgrind keeps track of uninitialized bits and propagates them
> > > around till "use". Where use is anything that might alter the
> > > observable behavior of the program. Which is control flow
> > > transfers, conditional moves, addresses used in memory accesses,
> > > and data passed to system calls.
> > >
> > > This paper describes some of the memcheck tricks:
> > > https://valgrind.org/docs/memcheck2005.pdf
> >
> > That probably means bugs like
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63311
> > could be resolved as fixed (in valgrind).
>
> I just tried the testcase from that bug and it still replicates with
> gcc 11.2.1 and valgrind 3.18.1. And as far as I can see it really
> cannot be fixed in valgrind since gcc really generates a conditional
> jump based on an uninit variable in this case.
>
> It does look a bit like what Julian described in:
>
>   Memcheck Reloaded
>   dealing with compiler-generated branches on undefined values
> https://archive.fosdem.org/2020/schedule/event/debugging_memcheck_reloaded/
>
> Which should be able to recover/reconstruct the original control flow.
> In cases like:
>
> int result
> bool ok = compute_something(&result)
> if (ok && result == 42) { ... }
>
> where gcc turns that last line upside down:
>
> if (result == 42 && ok) { ... }
>
> But it doesn't work in this case. Probably because this is a slightly
> more complex case involving 3 distinct variables instead of 2.
>
> Cheers,
>
> Mark