From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=sE0E=6E=gmail.com=jwakely.gcc@sourceware.org>
Received: from mail-ej1-x632.google.com (mail-ej1-x632.google.com [IPv6:2a00:1450:4864:20::632])
	by sourceware.org (Postfix) with ESMTPS id 5C85E3858C5E
	for <gcc-help@gcc.gnu.org>; Wed,  8 Feb 2023 13:50:03 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5C85E3858C5E
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-ej1-x632.google.com with SMTP id sa10so20933311ejc.9
        for <gcc-help@gcc.gnu.org>; Wed, 08 Feb 2023 05:50:03 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=qBlm9UfBQrSv75sNptI1a5iVYlv/26TbE98fZOD8PXw=;
        b=ZXJnIV70stfEcFmqL98JJ1XSi6JDJNOizJeZvQnGIMFfZTZKbfDlbyAWOOa2GHpYUk
         ++SM0h6hmqjn27lVUdHWRaRdbWczrgrQJXpdCy/y4ffGzMlfZ0lWHJqYLGpL8CZNLrBi
         n+iOMd1xSu+MF1HUVPkIQhVRUv6cISXbmAVDBXaKqrN9vjRne6hxa7IBxJfdaClRJFCC
         vkqM1iUFcvF+7wyAhg4lNp+gGAKx6DgdO+KwsddvPfvEBsj+qZKmhUoRLfHOQ5tG5f1L
         sQyoR/2cPYmcz0az9xhwBjX+VaXp+EIB/sdTSAAFvMeUQFFABe/IIRvKJb0FgUVuq6hU
         N1Qg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=qBlm9UfBQrSv75sNptI1a5iVYlv/26TbE98fZOD8PXw=;
        b=DBrRo9JWzwKzhfPRhixjE2DGrKn+78cmDZCJkcxyOnbD5W80EINzuzzqjpjwHuSsed
         FHoPTnSOZPWB3ijAb6/jG1MEku0z4mu0CtQURlWUlb4TeZn/tBaxSuRWaqv1RYOGhIC5
         yjsw4RjjuSx8/tOgJriJuHiiLb9DXqmzzOCS1RLZwTJGMvfabIDqhAO5g7fqGsL5qcYV
         XeAJE94B8jVP6ll+UPHCTMQwBqITQ14WhPZ/mDtktqNjHAAobf7uROPmvfPVJI/e9olP
         01qBobrf4c4C+0XcBL74YgLmaYVTJWto4eDk3JNTivKQdCvcOuOlMGstQQALvGmCrpgE
         QqDA==
X-Gm-Message-State: AO0yUKVrPY6hmAHA0tRyJrE5Hc4RSQE/9dVlfLpx2n1eLk6xVVEVg12i
	7biZrybD+PttkDNaXAiGZr3qFkJ2fqU2eHAofxU=
X-Google-Smtp-Source: AK7set/E2FJwEs2xu5T7D9qQPBzcvrCvVipalRhq1rkgjFwVZXsYba/laW0TmZjb0OvRMgYlhiOIp+uv7IC/hg3GvPk=
X-Received: by 2002:a17:906:90c1:b0:88f:9c9a:828 with SMTP id
 v1-20020a17090690c100b0088f9c9a0828mr1597882ejw.190.1675864201889; Wed, 08
 Feb 2023 05:50:01 -0800 (PST)
MIME-Version: 1.0
References: <VE1PR06MB705447E5FF4C3080E5F3E586F1D89@VE1PR06MB7054.eurprd06.prod.outlook.com>
In-Reply-To: <VE1PR06MB705447E5FF4C3080E5F3E586F1D89@VE1PR06MB7054.eurprd06.prod.outlook.com>
From: Jonathan Wakely <jwakely.gcc@gmail.com>
Date: Wed, 8 Feb 2023 13:49:50 +0000
Message-ID: <CAH6eHdQeS_D_6tM9cgD-yNZnHuRYDzoFtpo=+DZweT4Lm1iHKw@mail.gmail.com>
Subject: Re: Why does this unrolled function write to the stack?
To: Gaelan Steele <gbs3@st-andrews.ac.uk>
Cc: "gcc-help@gcc.gnu.org" <gcc-help@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-0.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-help.gcc.gnu.org>

On Wed, 8 Feb 2023 at 13:31, Gaelan Steele via Gcc-help
<gcc-help@gcc.gnu.org> wrote:
>
> Hi all,
>
> In a computer architecture class, we happened across a strange compilatio=
n choice by GCC that neither I nor my professor can make much sense of. The=
 source is as follows:
>
> void foo(int *a, const int *__restrict b, const int *__restrict c)
> {
>   for (int i =3D 0; i < 16; i++) {
>     a[i] =3D b[i] + c[i];
>   }
> }
>
> I won't reproduce the full compiled output here, as it's rather long, but=
 when compiled with -O3 -mno-avx -mno-sse, GCC 12.2 for x86-64 (via Compile=
r Explorer: https://godbolt.org/z/o9e4o7cj4) produces an unrolled loop that=
 appears to write each sum into an array on the stack before copying it int=
o the provided pointer a. This seems hugely inefficient - it's doing quite =
a few memory accesses - and I can't see why it would be necessary.

I don't think it's *necessary*. If you use -Os or -O1 or -O2 you get a
loop. So it's just an optimization choice at -O3 presumably based on
cost estimates that say that fully unrolling the loop will make the
code faster than looping.

>
> Am I missing some reason why this is more efficient than the naive approa=
ch (computing the each sum into an intermediate register, then writing it d=
irectly into a)?

Benchmarking the function at different optimization levels I get:

Run on (8 X 4500 MHz CPU s)
CPU Caches:
 L1 Data 32 KiB (x4)
 L1 Instruction 32 KiB (x4)
 L2 Unified 256 KiB (x4)
 L3 Unified 8192 KiB (x1)
Load Average: 0.14, 0.22, 0.39
***WARNING*** CPU scaling is enabled, the benchmark real time
measurements may be noisy and will incur extra overhead.
-----------------------------------------------------
Benchmark           Time             CPU   Iterations
-----------------------------------------------------
O3               1.60 ns         1.60 ns    432901632
O2               3.56 ns         3.56 ns    197086506
O1               6.87 ns         6.86 ns    101839250
Os               8.23 ns         8.22 ns     85273333


Using quickbench:
https://quick-bench.com/q/sSwVvtrkOCp9q-XyKAevthiaNAw