From: guojiufu <guojiufu@linux.ibm.com>
To: Richard Biener <rguenther@suse.de>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
gcc-patches@gcc.gnu.org, wschmidt@linux.ibm.com,
dje.gcc@gmail.com, jlaw@tachyum.com, amker.cheng@gmail.com
Subject: Re: [PATCH] Check type size for doloop iv on BITS_PER_WORD [PR61837]
Date: Mon, 12 Jul 2021 17:43:56 +0800 [thread overview]
Message-ID: <619972b1f8eb6f567801a8f9e1695b89@imap.linux.ibm.com> (raw)
In-Reply-To: <nycvar.YFH.7.76.2107121053430.10711@zhemvz.fhfr.qr>
On 2021-07-12 16:57, Richard Biener wrote:
> On Mon, 12 Jul 2021, guojiufu wrote:
>
>> On 2021-07-12 14:20, Richard Biener wrote:
>> > On Fri, 9 Jul 2021, Segher Boessenkool wrote:
>> >
>> >> On Fri, Jul 09, 2021 at 08:43:59AM +0200, Richard Biener wrote:
>> >> > I wonder if there's a way to query the target what modes the doloop
>> >> > pattern can handle (not being too familiar with the doloop code).
>> >>
>> >> You can look what modes are allowed for operand 0 of doloop_end,
>> >> perhaps? Although that is a define_expand, not a define_insn, so it is
>> >> hard to introspect.
>> >>
>> >> > Why do you need to do any checks besides the new type being able to
>> >> > represent all IV values? The original doloop IV will never wrap
>> >> > (OTOH if niter is U*_MAX then we compute niter + 1 which will become
>> >> > zero ... I suppose the doloop might still do the correct thing here
>> >> > but it also still will with a IV with larger type).
>>
>> The issue comes from U*_MAX (original short MAX), as you said: on
>> which
>> niter + 1 becomes zero. And because the step for doloop is -1; then,
>> on
>> larger type 'zero - 1' will be a very large number on larger type
>> (e.g. 0xff...ff); but on the original short type 'zero - 1' is a small
>> value
>> (e.g. "0xff").
>
> But for the larger type the small type MAX + 1 fits and does not yield
> zero so it should still work exactly as before, no? Of course you
> have to compute the + 1 in the larger type.
>
You are right, if compute the "+ 1" in the larger type it is ok, as
below code:
```
/* Use type in word size may fast. */
if (TYPE_PRECISION (ntype) < BITS_PER_WORD)
{
ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1);
niter = fold_convert (ntype, niter);
}
tree base = fold_build2 (PLUS_EXPR, ntype, unshare_expr (niter),
build_int_cst (ntype, 1));
add_candidate (data, base, build_int_cst (ntype, -1), true, NULL,
NULL, true);
```
The issue of this is, this code generates more stmt for doloop.xxx:
_12 = (unsigned int) xx(D);
_10 = _12 + 4294967295;
_24 = (long unsigned int) _10;
doloop.6_8 = _24 + 1;
if use previous patch, "+ 1" on original type, then the stmts will looks
like:
_12 = (unsigned int) xx(D);
doloop.6_8 = (long unsigned int) _12;
This is the reason for checking
wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE (ntype)))
>> >>
>> >> doloop_valid_p guarantees it is simple and doesn't wrap.
>> >>
>> >> > I'd have expected sth like
>> >> >
>> >> > ntype = lang_hooks.types.type_for_mode (word_mode, TYPE_UNSIGNED
>> >> > (ntype));
>> >> >
>> >> > thus the decision made using a mode - which is also why I wonder
>> >> > if there's a way to query the target for this. As you say,
>> >> > it _may_ be fast, so better check (somehow).
>>
>>
>> I was also thinking of using hooks like type_for_size/type_for_mode.
>> /* Use type in word size may fast. */
>> if (TYPE_PRECISION (ntype) < BITS_PER_WORD
>> && Wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE
>> (ntype))))
>> {
>> ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1);
>> base = fold_convert (ntype, base);
>> }
>>
>> As you pointed out, this does not query the mode from targets.
>> As Segher pointed out "doloop_end" checks unsupported mode, while it
>> seems
>> not easy to use it in tree-ssa-loop-ivopts.c.
>> For implementations of doloop_end, tartgets like rs6000/aarch64/ia64
>> requires
>> Pmode/DImode; while there are other targets that work on other 'mode'
>> (e.g.
>> SI).
>>
>>
>> In doloop_optimize, there is code:
>>
>> ```
>> mode = desc->mode;
>> .....
>> doloop_reg = gen_reg_rtx (mode);
>> rtx_insn *doloop_seq = targetm.gen_doloop_end (doloop_reg,
>> start_label);
>>
>> word_mode_size = GET_MODE_PRECISION (word_mode);
>> word_mode_max = (HOST_WIDE_INT_1U << (word_mode_size - 1) << 1) -
>> 1;
>> if (! doloop_seq
>> && mode != word_mode
>> /* Before trying mode different from the one in that # of
>> iterations is
>> computed, we must be sure that the number of iterations
>> fits into
>> the new mode. */
>> && (word_mode_size >= GET_MODE_PRECISION (mode)
>> || wi::leu_p (iterations_max, word_mode_max)))
>> {
>> if (word_mode_size > GET_MODE_PRECISION (mode))
>> count = simplify_gen_unary (ZERO_EXTEND, word_mode, count,
>> mode);
>> else
>> count = lowpart_subreg (word_mode, count, mode);
>> PUT_MODE (doloop_reg, word_mode);
>> doloop_seq = targetm.gen_doloop_end (doloop_reg, start_label);
>> }
>> if (! doloop_seq)
>> {
>> if (dump_file)
>> fprintf (dump_file,
>> "Doloop: Target unwilling to use doloop
>> pattern!\n");
>> return false;
>> }
>> ```
>> The above code first tries the mode of niter_desc by call
>> targetm.gen_doloop_end
>> to see if the target can generate doloop insns, if fail, then try to
>> use
>> 'word_mode' against gen_doloop_end.
>>
>>
>> >>
>> >> Almost all targets just use Pmode, but there is no such guarantee I
>> >> think, and esp. some targets that do not have machine insns for this
>> >> (but want to generate different code for this anyway) can do pretty much
>> >> anything.
>> >>
>> >> Maybe using just Pmode here is good enough though?
>> >
>> > I think Pmode is a particularly bad choice and I'd prefer word_mode
>> > if we go for any hardcoded mode. s390x for example seems to handle
>> > both SImode and DImode (but names the helper gen_doloop_si64
>> > for SImode?!). But indeed it looks like somehow querying doloop_end
>> > is going to be difficult since the expander doesn't have any mode,
>> > so we'd have to actually try emit RTL here.
>>
>> Instead of using hardcode mode, maybe we could add a hook for targets
>> to
>> return
>> the preferred mode.
>
> That's a possiblity of course. Like the following which just shows the
> default implementation then (pass in current mode, return a more
> preferred
> mode or the mode itself)
>
> enum machine_mode
> prefered_doloop_mode (enum machine_mode mode)
> {
> return mode;
> }
>
Yes, thanks!
Checking current do_loop_end in targets, in general, when do_loop_end
requires
SI mode, the target is defining Pmode as SImode and word_mode (from
BITS_PER_WORD
which defaults from UNITS_PER_WORD) is also defined to align with SI
mode.
When do_loop_end requires DI mode, the target is defining Pmode as
DImode
and word_mode/UNITS_PER_WORD is also defined to align with DI mode.
So, if aggressively, then by default we may just return word_mode.
BR,
Jiufu Guo.
>>
>> Thanks for those valuable comments!
>>
>> Jiufu Guo
>>
>>
>>
>> >
>> > Richard.
>>
>>
next prev parent reply other threads:[~2021-07-12 9:44 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-09 2:07 Jiufu Guo
2021-07-09 6:43 ` Richard Biener
2021-07-09 17:08 ` Segher Boessenkool
2021-07-12 6:20 ` Richard Biener
2021-07-12 8:21 ` guojiufu
2021-07-12 8:57 ` Richard Biener
2021-07-12 9:43 ` guojiufu [this message]
2021-07-12 10:02 ` Richard Biener
2021-07-12 14:08 ` guojiufu
2021-07-12 14:46 ` Richard Biener
2021-07-12 15:53 ` guojiufu
2021-07-12 15:59 ` guojiufu
2021-07-13 2:09 ` guojiufu
2021-07-13 7:09 ` Richard Biener
2021-07-13 8:16 ` guojiufu
2021-07-13 15:51 ` Segher Boessenkool
2021-07-14 2:38 ` guojiufu
2021-07-13 15:38 ` Segher Boessenkool
2021-07-14 3:10 ` guojiufu
2021-07-14 7:17 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=619972b1f8eb6f567801a8f9e1695b89@imap.linux.ibm.com \
--to=guojiufu@linux.ibm.com \
--cc=amker.cheng@gmail.com \
--cc=dje.gcc@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=jlaw@tachyum.com \
--cc=rguenther@suse.de \
--cc=segher@kernel.crashing.org \
--cc=wschmidt@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).