From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tadeus.prastowo@unitn.it>
Received: from mail-io1-xd34.google.com (mail-io1-xd34.google.com
 [IPv6:2607:f8b0:4864:20::d34])
 by sourceware.org (Postfix) with ESMTPS id CE4E13858D33
 for <gcc@gcc.gnu.org>; Fri,  2 Apr 2021 16:32:13 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org CE4E13858D33
Received: by mail-io1-xd34.google.com with SMTP id x16so5896786iob.1
 for <gcc@gcc.gnu.org>; Fri, 02 Apr 2021 09:32:13 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=L9d/UHKuZfK0Znw/8gPyle5U8MMxSvXD7Ay4Q2mV5ps=;
 b=p/ScNnyGH3LEkG78/l3Dq2nc5/dDdOh6x3dh19whMMc+cvan157/dCulaY/WoCOvWt
 0Rdhf0UknKqwL6mFnauolTOecap5/h8RlivFTbfiw6cCO1bPy4TOHU+z7fW+iDOw1Sli
 m7ge1G0y4kU6JryhILVJxruoAhl2Euuon1v9dsYeZP1USyyntHq2Xs8zxGZkVAyigbu6
 hb+2QH4/tdzFh/16qKVQqaB2Z+Vha9MoEFnlNajZHCCiDKs05hxaAytjK7qwejc5Rr1A
 4DyjoPRQx5dzblRd6tR8trqAHhkFD7rIOu9h85dqhN5QdjrbwGy6/PcDLjAgSImcTwbw
 rpbg==
X-Gm-Message-State: AOAM532kBeJ+FYHk4zixCwsLtvehPZljVu0+TSkscVopt0SAQ/9vwoVx
 o4CM52ma4EUBVI8LeDf/wTlmfZTqrKyQwepgt2TOT3xCleaQfOHcT0tWqIh+rzlAaV1PCwpXNEK
 1vi8+JB15OEH9ZqifUbNuA3DiqbtaXxC2S7UUJpCicpMQA0fLl2NbZW05XYON7qsrsA==
X-Google-Smtp-Source: ABdhPJxIllGBaqZJPKSID5YQtkpeFqJeHLW+lVorUjKynlj+PlpIfaZxMkKRBW0XutaPdfG27fqkyg==
X-Received: by 2002:a6b:7f4d:: with SMTP id m13mr11526647ioq.134.1617381132960; 
 Fri, 02 Apr 2021 09:32:12 -0700 (PDT)
Received: from mail-il1-f175.google.com (mail-il1-f175.google.com.
 [209.85.166.175])
 by smtp.gmail.com with ESMTPSA id l1sm4719627ioj.52.2021.04.02.09.32.11
 for <gcc@gcc.gnu.org>
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Fri, 02 Apr 2021 09:32:12 -0700 (PDT)
Received: by mail-il1-f175.google.com with SMTP id p8so706890ilm.13
 for <gcc@gcc.gnu.org>; Fri, 02 Apr 2021 09:32:11 -0700 (PDT)
X-Received: by 2002:a05:6e02:ec9:: with SMTP id
 i9mr11738109ilk.161.1617381131626; 
 Fri, 02 Apr 2021 09:32:11 -0700 (PDT)
MIME-Version: 1.0
References: <CAOAHwbVrGRa1jDwdtvEpO2Gck-3Un0JOpd5y8cRbFcKpkJy-Zw@mail.gmail.com>
 <CAA1YtmsoEFb2uPK18TA3nbHh8dYKU+UP7iz_adejRnNk=tUNEQ@mail.gmail.com>
 <CAOAHwbU8EfACwygPgGzN5RYvP0ftW8yw5900SVeJPwbRWCC8_A@mail.gmail.com>
In-Reply-To: <CAOAHwbU8EfACwygPgGzN5RYvP0ftW8yw5900SVeJPwbRWCC8_A@mail.gmail.com>
From: Tadeus Prastowo <tadeus.prastowo@unitn.it>
Date: Fri, 2 Apr 2021 18:32:00 +0200
X-Gmail-Original-Message-ID: <CAA1YtmswZdH2jLoP3-xYFh7dLRSec_v3QCd+55ZUcBO=HfbBTw@mail.gmail.com>
Message-ID: <CAA1YtmswZdH2jLoP3-xYFh7dLRSec_v3QCd+55ZUcBO=HfbBTw@mail.gmail.com>
Subject: Re: Build reproducibility of gcc @ NixOS
To: Arthur Gautier <gcc.gnu.org@superbaloo.net>
Cc: gcc@gcc.gnu.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Apr 2021 16:32:19 -0000

Hi Arthur,

On Fri, Apr 2, 2021 at 5:04 PM Arthur Gautier
<gcc.gnu.org@superbaloo.net> wrote:
>
> Hi Tadeus,
>
> On Fri, Apr 2, 2021 at 9:07 AM Tadeus Prastowo <tadeus.prastowo@unitn.it>=
 wrote:

[...]

> > Since an optimized build is likely to be machine-dependent regardless
> > of any intended injection (e.g., different instructions used in GCC
> > binaries depending on /proc/cpuinfo), I don't understand why an
> > optimized build should be reproducible on different machines, unless
> > of course every channel that GCC uses to find out about the machine
> > (e.g., /proc/cpuinfo) is under your total control.  So, do you mean to
> > ask a list of all channels that GCC uses to find out about the
> > machine?
>
> This is where I'm getting confused. According to the manual,
> stagetrain only record branch statistics.

By "the manual", do you refer to
https://gcc.gnu.org/install/build.html#Building-with-profile-feedback
?

Quoting the page:
  When =E2=80=98make profiledbootstrap=E2=80=99 is run, it will first build=
 a stage1 compiler.
  This compiler is used to build a stageprofile compiler instrumented
to collect execution counts of instruction and branch probabilities.
  Training run is done by building stagetrain compiler.
  Finally a stagefeedback compiler is built using the information collected=
.
End quote.

Based on the quote, a reproducible build is to expected for the
following compilers:
1. The stage1 compiler.
2. The stageprofile compiler, which is built by the stage1 compiler.
3. The stagetrain compiler, which is built by the stageprofile compiler.

Then, a reproducible build is expected for the stagefeedback compiler
on the condition that the same information, which was collected by the
stageprofile compiler when building the stagegrain compiler, is used.

Do you agree with that reasoning?

> And I would expect, given
> the same input provided in the same order, two different architectures
> to take the same branch, and not observe any difference.

In other words, you expect that branch statistics depends only on the
given source code.  Correct?

> I understand
> that with autoprofiled builds, the local architecture behavior is
> injected in the build, but I don't use that.
> I'm not using any -march in the build either (as far as I can
> understand/tell). So I do not expect the build to change its
> instruction set either.
>
> Is that normal that two different architectures would issue two
> different "execution counts of instruction and branch probabilities"?

I guess that it would be the case.

> Or is there something more?

Perhaps you can have the reproducible build that you want by first
isolating the information that is collected by the stageprofile
compiler when building the stagegrain compiler and then reusing the
same information when building every other stagefeedback compiler.

> Thank you for your reply!

My pleasure.

> --
> Arthur

--=20
Best regards,
Tadeus