From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <oliva@adacore.com>
Received: from rock.gnat.com (rock.gnat.com [205.232.38.15])
 by sourceware.org (Postfix) with ESMTP id 46B9239960F7
 for <gcc-patches@gcc.gnu.org>; Wed,  3 Feb 2021 15:11:47 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 46B9239960F7
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=adacore.com
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=oliva@adacore.com
Received: from localhost (localhost.localdomain [127.0.0.1])
 by filtered-rock.gnat.com (Postfix) with ESMTP id 12163117286;
 Wed,  3 Feb 2021 10:11:47 -0500 (EST)
X-Virus-Scanned: Debian amavisd-new at gnat.com
Received: from rock.gnat.com ([127.0.0.1])
 by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024)
 with LMTP id b9hQmaLJaTm0; Wed,  3 Feb 2021 10:11:47 -0500 (EST)
Received: from free.home (tron.gnat.com
 [IPv6:2620:20:4000:0:46a8:42ff:fe0e:e294])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by rock.gnat.com (Postfix) with ESMTPS id C9194117280;
 Wed,  3 Feb 2021 10:11:46 -0500 (EST)
Received: from livre (livre.home [172.31.160.2])
 by free.home (8.15.2/8.15.2) with ESMTPS id 113FBdWE061500
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Wed, 3 Feb 2021 12:11:39 -0300
From: Alexandre Oliva <oliva@adacore.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Jan Hubicka <hubicka@ucw.cz>, GCC Patches <gcc-patches@gcc.gnu.org>,
 Zdenek Dvorak <ook@ucw.cz>
Subject: Re: [RFC] test builtin ratio for loop distribution
Organization: Free thinker, does not speak for AdaCore
References: <orim7i8sei.fsf@lxoliva.fsfla.org>
 <CAFiYyc3G_h8rU6Zkr=3j_Gx0w9NUNVeoG51RJdpRKBCuuM_+Kw@mail.gmail.com>
 <or1re58wak.fsf@lxoliva.fsfla.org>
 <CAFiYyc0NnVH3dL5ZvUucPYEL3BWz32QPYndqH-k-V2ztSQ7CtQ@mail.gmail.com>
 <orr1lyv1de.fsf@lxoliva.fsfla.org>
 <CAFiYyc02mn1ApSw9fy_SCsHtjm3=ZWUasiX_BKCTgsm1=yEhPQ@mail.gmail.com>
Errors-To: aoliva@lxoliva.fsfla.org
Date: Wed, 03 Feb 2021 12:11:39 -0300
In-Reply-To: <CAFiYyc02mn1ApSw9fy_SCsHtjm3=ZWUasiX_BKCTgsm1=yEhPQ@mail.gmail.com>
 (Richard Biener's message of "Wed, 3 Feb 2021 09:59:20 +0100")
Message-ID: <or4kituqxg.fsf@lxoliva.fsfla.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Scanned-By: MIMEDefang 2.84
X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Feb 2021 15:11:49 -0000

On Feb  3, 2021, Richard Biener <richard.guenther@gmail.com> wrote:

> So I think we should try to match what __builtin_memcpy/memset
> expansion would do here, taking advantage of extra alignment
> and size knowledge.  In particular,

>  a) if __builtin_memcpy/memset would use setmem/cpymem optabs
>      see if we can have variants of memcpy/memset transferring alignment
>      and size knowledge

We could add more optional parameters to them to convey the length's
known ctz.  However, the ctz can't be recovered reliably.  We can't even
recover it after gimplifying the length within ldist!

That said, my other patch already enables ctz calls to recover it, at
least in libgcc risc-v tfmode cases, and it's possible it's readily
available in other cases.  I'd rather leave that for someone dealing
with the machine-specific patterns to figure out whether a separate
argument would be useful.  RISC-V, which is what I'm dealing with,
doesn't have much to offer as far as these patterns are concerned.

>  b) if expansion would use BY_PIECES then expand to an unrolled loop

Why would that be better than keeping the constant-length memset call,
that would be turned into an unrolled loop during expand?

>  c) if expansion would emit a memset/memcpy call but we know
>      alignment and have a low bound on niters emit a loop (like your patch does)

> a) might be difficult but adding the builtin variants may pay off in any case.

*nod*

> The patch itself could benefit from one or two helpers we already
> have, first of all there's create_empty_loop_on_edge (so you don't
> need the loop fixup)

Uhh, thanks, but...  you realize nearly all of the gimple-building code
is one and the same for the loop and for trailing count misalignment?
There doesn't seem to be a lot of benefit to using this primitive, aside
from its dealing with creating the loop data structure which, again, I'd
only want to do in the loop case.

(I guess I should add more comments as to the inline expansion
 strategy.  it's equivalent to:

 i = len, ptr = base, blksz = 1 << alctz;
 while (i >= blksz) { *(ub<blksz>*)ptr = val; i -= blksz; ptr += blksz; }
 blksz >>= 1; if (i >= blksz) { *(ub<blksz>*)ptr = val; i -= blksz; ptr += blksz; }
 blksz >>= 1; if (i >= blksz) { *(ub<blksz>*)ptr = val; i -= blksz; ptr += blksz; }
 ... until blksz gets down to zero or to 1<<szctz
 
> Note that for memmove if we know the dependence direction, we
> can also emit a loop / unrolled code.

*nod*, but the logic would have to be quite different, using bit tests,
and odds are we won't know the direction and have to output a test and
code for both possibilities, which would be quite unlikely to be
beneficial.  Though the original code would quite likely make the
direction visible; perhaps if the size is small enough that we would
expand a memcpy inline, we should refrain from transforming the loop
into a memmove call.

In the case at hand, there's no benefit at all to these transformations:
we start from a loop with the known alignment and a small loop count (0
to 2 words copied), and if everything goes all right with the
transformation, we may be lucky to get back to that.  It's not like the
transformation could even increase the known alignment, so why bother,
and throw away debug info by rewriting the loop into the same code minus
debug?

> I think the builtins with alignment and calloc-style element count
> will be useful on its own.

Oh, I see, you're suggesting actual separate builtin functions.  Uhh...
I'm not sure I want to go there.  I'd much rather recover the ctz of the
length, and use it in existing code.


I'd also prefer if the generic memset (and memcpy and memmove?) builtin
expanders dealt with non-constant lengths in the way I implemented.
That feels like the right spot for it.  That deprives us of gimple loop
optimizations in the inlined loop generated by the current patch, but if
we expand an unrolled loop with compares and offsets with small
constants, loop optimizations might not even be relevant.


FWIW, the patch I posted yesterday is broken, the regstrap test did not
even build libstdc++-v3 successfully.  I'm not sure whether to pursue it
further, or to reimplement it in the expander.  Suggestions are welcome.

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist         GNU Toolchain Engineer
        Vim, Vi, Voltei pro Emacs -- GNUlius Caesar