Re: [RFC] test builtin ratio for loop distribution

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Alexandre Oliva <oliva@adacore.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Jan Hubicka <hubicka@ucw.cz>,
	GCC Patches <gcc-patches@gcc.gnu.org>, Zdenek Dvorak <ook@ucw.cz>
Subject: Re: [RFC] test builtin ratio for loop distribution
Date: Wed, 03 Feb 2021 12:11:39 -0300	[thread overview]
Message-ID: <or4kituqxg.fsf@lxoliva.fsfla.org> (raw)
In-Reply-To: <CAFiYyc02mn1ApSw9fy_SCsHtjm3=ZWUasiX_BKCTgsm1=yEhPQ@mail.gmail.com> (Richard Biener's message of "Wed, 3 Feb 2021 09:59:20 +0100")

On Feb  3, 2021, Richard Biener <richard.guenther@gmail.com> wrote:

> So I think we should try to match what __builtin_memcpy/memset
> expansion would do here, taking advantage of extra alignment
> and size knowledge.  In particular,

>  a) if __builtin_memcpy/memset would use setmem/cpymem optabs
>      see if we can have variants of memcpy/memset transferring alignment
>      and size knowledge

We could add more optional parameters to them to convey the length's
known ctz.  However, the ctz can't be recovered reliably.  We can't even
recover it after gimplifying the length within ldist!

That said, my other patch already enables ctz calls to recover it, at
least in libgcc risc-v tfmode cases, and it's possible it's readily
available in other cases.  I'd rather leave that for someone dealing
with the machine-specific patterns to figure out whether a separate
argument would be useful.  RISC-V, which is what I'm dealing with,
doesn't have much to offer as far as these patterns are concerned.

>  b) if expansion would use BY_PIECES then expand to an unrolled loop

Why would that be better than keeping the constant-length memset call,
that would be turned into an unrolled loop during expand?

>  c) if expansion would emit a memset/memcpy call but we know
>      alignment and have a low bound on niters emit a loop (like your patch does)

> a) might be difficult but adding the builtin variants may pay off in any case.

*nod*

> The patch itself could benefit from one or two helpers we already
> have, first of all there's create_empty_loop_on_edge (so you don't
> need the loop fixup)

Uhh, thanks, but...  you realize nearly all of the gimple-building code
is one and the same for the loop and for trailing count misalignment?
There doesn't seem to be a lot of benefit to using this primitive, aside
from its dealing with creating the loop data structure which, again, I'd
only want to do in the loop case.

(I guess I should add more comments as to the inline expansion
 strategy.  it's equivalent to:

 i = len, ptr = base, blksz = 1 << alctz;
 while (i >= blksz) { *(ub<blksz>*)ptr = val; i -= blksz; ptr += blksz; }
 blksz >>= 1; if (i >= blksz) { *(ub<blksz>*)ptr = val; i -= blksz; ptr += blksz; }
 blksz >>= 1; if (i >= blksz) { *(ub<blksz>*)ptr = val; i -= blksz; ptr += blksz; }
 ... until blksz gets down to zero or to 1<<szctz

> Note that for memmove if we know the dependence direction, we
> can also emit a loop / unrolled code.

*nod*, but the logic would have to be quite different, using bit tests,
and odds are we won't know the direction and have to output a test and
code for both possibilities, which would be quite unlikely to be
beneficial.  Though the original code would quite likely make the
direction visible; perhaps if the size is small enough that we would
expand a memcpy inline, we should refrain from transforming the loop
into a memmove call.

In the case at hand, there's no benefit at all to these transformations:
we start from a loop with the known alignment and a small loop count (0
to 2 words copied), and if everything goes all right with the
transformation, we may be lucky to get back to that.  It's not like the
transformation could even increase the known alignment, so why bother,
and throw away debug info by rewriting the loop into the same code minus
debug?

> I think the builtins with alignment and calloc-style element count
> will be useful on its own.

Oh, I see, you're suggesting actual separate builtin functions.  Uhh...
I'm not sure I want to go there.  I'd much rather recover the ctz of the
length, and use it in existing code.

I'd also prefer if the generic memset (and memcpy and memmove?) builtin
expanders dealt with non-constant lengths in the way I implemented.
That feels like the right spot for it.  That deprives us of gimple loop
optimizations in the inlined loop generated by the current patch, but if
we expand an unrolled loop with compares and offsets with small
constants, loop optimizations might not even be relevant.

FWIW, the patch I posted yesterday is broken, the regstrap test did not
even build libstdc++-v3 successfully.  I'm not sure whether to pursue it
further, or to reimplement it in the expander.  Suggestions are welcome.

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist         GNU Toolchain Engineer
        Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

next prev parent reply	other threads:[~2021-02-03 15:11 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-27 12:40 Alexandre Oliva
2021-01-27 15:12 ` Richard Biener
2021-01-28  5:28   ` Alexandre Oliva
2021-01-28  8:59     ` Richard Biener
2021-02-02 17:13       ` Alexandre Oliva
2021-02-03  8:59         ` Richard Biener
2021-02-03 15:11           ` Alexandre Oliva [this message]
2021-02-04  8:37             ` Richard Biener
2021-02-04 22:17               ` Alexandre Oliva
2021-02-05  8:02                 ` Richard Biener
2021-02-11 10:19                 ` Alexandre Oliva
2021-02-11 12:14                   ` Alexandre Oliva
2021-02-12 11:34                   ` Richard Biener
2021-02-16  4:56                     ` Alexandre Oliva
2021-02-16 10:47                       ` Alexandre Oliva
2021-02-16 12:11                         ` Richard Biener
2021-02-19  8:08                           ` [PR94092] " Alexandre Oliva
2021-02-22  9:53                             ` Richard Biener
2021-04-29  4:26                               ` Alexandre Oliva
2021-04-30 14:42                                 ` Jeff Law
2021-05-03  8:55                                   ` Richard Biener
2021-05-04  1:59                                     ` Alexandre Oliva
2021-05-04  5:49                                       ` Prathamesh Kulkarni
2021-05-04  6:09                                         ` Alexandre Oliva
2021-02-05  0:13 ` Jim Wilson
2021-02-11 10:11   ` Alexandre Oliva

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=or4kituqxg.fsf@lxoliva.fsfla.org \
    --to=oliva@adacore.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hubicka@ucw.cz \
    --cc=ook@ucw.cz \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).