From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <christophm30@gmail.com>
Received: from mail-vs1-f54.google.com (mail-vs1-f54.google.com
 [209.85.217.54])
 by sourceware.org (Postfix) with ESMTPS id B628B3938C24
 for <gcc@gcc.gnu.org>; Tue, 13 Jul 2021 12:18:21 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B628B3938C24
Received: by mail-vs1-f54.google.com with SMTP id a66so1772062vsd.10
 for <gcc@gcc.gnu.org>; Tue, 13 Jul 2021 05:18:21 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=OURPKgkmHkUPP8mThXtysS82IvuLTFGWkz8d15sj0VE=;
 b=joKd9R4sUvjomhn0NPuS374UYLfLNLkpNZJWkSFIo0aGcrbm+vfJxlPI0L0wS8so69
 T+FtpqgzEaxQCqLE0M7rtu5Gg7aCoMeQX89JOuQNH6B0aYfu1Ok2g+wXdd4LRw6Ryfhc
 kAFgjHE1RdGLuP7ZTyXXR3eRkxFFiMoMzYbTCjEKTtffCJ524Uy0W4I7OneVZQk2R99Y
 WASxZwKLAjDDoQRh/2tFZEyGOED126E0KkG21Rkj3u5Wzuqs6trOtwj+7txN8Fq/sQ8u
 b9ZrjfDvu/XqIGzv+JlxuUwqW0VBqLBjSq/tzgHJiCLrCbudH7ZY03jQl0/fkLAr+qDd
 gjMg==
X-Gm-Message-State: AOAM533YW5PQSxtc/D9ZcY6eWtzhIEolvU6lKXXDEIVXPXepi/YQKUl1
 +yE1jEbiXYCD06gpqi2Q5OB2NRujOal8gQ==
X-Google-Smtp-Source: ABdhPJxYHLbzLhcRoBxpg7iq4FTGdFgGCSUlgpN3AJYnmRlIIO9ibGb6CxTenCI5+e790w+p19+IcQ==
X-Received: by 2002:a67:fa16:: with SMTP id i22mr3433894vsq.49.1626178701109; 
 Tue, 13 Jul 2021 05:18:21 -0700 (PDT)
Received: from mail-vs1-f45.google.com (mail-vs1-f45.google.com.
 [209.85.217.45])
 by smtp.gmail.com with ESMTPSA id g21sm259934vkd.53.2021.07.13.05.18.20
 for <gcc@gcc.gnu.org>
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Tue, 13 Jul 2021 05:18:20 -0700 (PDT)
Received: by mail-vs1-f45.google.com with SMTP id j8so12171607vsd.0
 for <gcc@gcc.gnu.org>; Tue, 13 Jul 2021 05:18:20 -0700 (PDT)
X-Received: by 2002:a67:ec8f:: with SMTP id h15mr5296428vsp.54.1626178700477; 
 Tue, 13 Jul 2021 05:18:20 -0700 (PDT)
MIME-Version: 1.0
References: <CAHB2gtTRpG=k9ekyHaMLsaujqa5dH8j6s8t=VfqqRt-jV4h9og@mail.gmail.com>
 <orpmvnul88.fsf@lxoliva.fsfla.org>
In-Reply-To: <orpmvnul88.fsf@lxoliva.fsfla.org>
From: =?UTF-8?Q?Christoph_M=C3=BCllner?= <cmuellner@gcc.gnu.org>
Date: Tue, 13 Jul 2021 14:18:09 +0200
X-Gmail-Original-Message-ID: <CAHB2gtS52DYxofOpy7A2A_4QdezCZ=pDWrBFVtYF_6Zg43H+sw@mail.gmail.com>
Message-ID: <CAHB2gtS52DYxofOpy7A2A_4QdezCZ=pDWrBFVtYF_6Zg43H+sw@mail.gmail.com>
Subject: Re: Priority of builtins expansion strategies
To: Alexandre Oliva <oliva@adacore.com>
Cc: gcc@gcc.gnu.org, Martin Sebor <msebor@redhat.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-3.9 required=5.0 tests=BAYES_00,
 FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE,
 RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jul 2021 12:18:23 -0000

On Tue, Jul 13, 2021 at 2:11 AM Alexandre Oliva <oliva@adacore.com> wrote:
>
> On Jul 12, 2021, Christoph M=C3=BCllner <cmuellner@gcc.gnu.org> wrote:
>
> > * Why does the generic by-pieces infrastructure have a higher priority
> > than the target-specific expansion via INSNs like setmem?
>
> by-pieces was not affected by the recent change, and IMHO it generally
> makes sense for it to have priority over setmem.  It generates only
> straigh-line code for constant-sized blocks.  Even if you can beat that
> with some machine-specific logic, you'll probably end up generating
> equivalent code at least in some cases, and then, you probably want to
> carefully tune the settings that select one or the other, or disable
> by-pieces altogether.
>
>
> by-multiple-pieces, OTOH, is likely to be beaten by machine-specific
> looping constructs, if any are available, so setmem takes precedence.
>
> My testing involved bringing it ahead of the insns, to exercise the code
> more thoroughly even on x86*, but the submitted patch only used
> by-multiple-pieces as a fallback.

Let me give you an example of what by-pieces does on RISC-V (RV64GC).
The following code...

void* do_memset0_8 (void *p)
{
    return memset (p, 0, 8);
}

void* do_memset0_15 (void *p)
{
    return memset (p, 0, 15);
}

...becomes (you can validate that with compiler explorer):

do_memset0_8(void*):
        sb      zero,0(a0)
        sb      zero,1(a0)
        sb      zero,2(a0)
        sb      zero,3(a0)
        sb      zero,4(a0)
        sb      zero,5(a0)
        sb      zero,6(a0)
        sb      zero,7(a0)
        ret
do_memset0_15(void*):
        sb      zero,0(a0)
        sb      zero,1(a0)
        sb      zero,2(a0)
        sb      zero,3(a0)
        sb      zero,4(a0)
        sb      zero,5(a0)
        sb      zero,6(a0)
        sb      zero,7(a0)
        sb      zero,8(a0)
        sb      zero,9(a0)
        sb      zero,10(a0)
        sb      zero,11(a0)
        sb      zero,12(a0)
        sb      zero,13(a0)
        sb      zero,14(a0)
        ret

Here is what a setmemsi expansion in the backend can do (in case
unaligned access is cheap):

000000000000003c <do_memset0_8>:
  3c:   00053023                sd      zero,0(a0)
  40:   8082                    ret

000000000000007e <do_memset0_15>:
  7e:   00053023                sd      zero,0(a0)
  82:   000533a3                sd      zero,7(a0)
  86:   8082                    ret

Is there a way to generate similar code with the by-pieces infrastructure?

> > * And if there are no particular reasons, would it be acceptable to
> > change the order?
>
> I suppose moving insns ahead of by-pieces might break careful tuning of
> multiple platforms, so I'd rather we did not make that change.

Only platforms that have "setmemsi" implemented would be affected.
And those platforms (arm, frv, ft32, nds32, pa, rs6000, rx, visium)
have a carefully tuned
implementation of the setmem expansion. I can't imagine that these
setmem expansions
produce less optimal code than the by-pieces infrastructure (which has
less knowledge
about the target).

Thanks,
Christoph