From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenther@suse.de>
Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29])
 by sourceware.org (Postfix) with ESMTPS id 76F6F3858439
 for <gcc-patches@gcc.gnu.org>; Mon, 12 Jul 2021 08:57:15 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 76F6F3858439
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
 by smtp-out2.suse.de (Postfix) with ESMTP id 745D91FD55;
 Mon, 12 Jul 2021 08:57:14 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
 t=1626080234; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
 mime-version:mime-version:content-type:content-type:
 in-reply-to:in-reply-to:references:references;
 bh=thDpAnF3WGxHHCWksTy4W6Z1popuT5X3M+w7Z7Fi/aU=;
 b=PpIqej/SAZbh34H2Ogsp6BomfB+mVU2SgwWy2PpalXMaYOhMsGzvx1vxcW1qe1L5JiNOhu
 /c0tw8h5bcvKCbYhW640hKzSj6Rja/w791/w/0zUrMYwMyZNs1e6v4lbigPbT59cvPCMbo
 3B1ghqN/qjYYaO6PMxJhXMMI86mH7OI=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_ed25519; t=1626080234;
 h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
 mime-version:mime-version:content-type:content-type:
 in-reply-to:in-reply-to:references:references;
 bh=thDpAnF3WGxHHCWksTy4W6Z1popuT5X3M+w7Z7Fi/aU=;
 b=/mg/Q7sYHjg8bKWTEScTKjSCMghUxjtsUWhvn9z/rDb4Zs5gwVB/dcNbwk6A0LMoDDLnPz
 dQaU2gF+oqYI5jBQ==
Received: from murzim.suse.de (murzim.suse.de [10.160.4.192])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by relay2.suse.de (Postfix) with ESMTPS id 45586A3B8D;
 Mon, 12 Jul 2021 08:57:14 +0000 (UTC)
Date: Mon, 12 Jul 2021 10:57:14 +0200 (CEST)
From: Richard Biener <rguenther@suse.de>
To: guojiufu <guojiufu@linux.ibm.com>
cc: Segher Boessenkool <segher@kernel.crashing.org>, gcc-patches@gcc.gnu.org, 
 wschmidt@linux.ibm.com, dje.gcc@gmail.com, jlaw@tachyum.com, 
 amker.cheng@gmail.com
Subject: Re: [PATCH] Check type size for doloop iv on BITS_PER_WORD [PR61837]
In-Reply-To: <e752f0a62db41698722413ac8eb4698b@imap.linux.ibm.com>
Message-ID: <nycvar.YFH.7.76.2107121053430.10711@zhemvz.fhfr.qr>
References: <20210709020718.177302-1-guojiufu@linux.ibm.com>
 <nycvar.YFH.7.76.2107090836090.10711@zhemvz.fhfr.qr>
 <20210709170832.GW1583@gate.crashing.org>
 <nycvar.YFH.7.76.2107120816190.10711@zhemvz.fhfr.qr>
 <e752f0a62db41698722413ac8eb4698b@imap.linux.ibm.com>
User-Agent: Alpine 2.21 (LSU 202 2017-01-01)
MIME-Version: 1.0
X-Spam-Status: No, score=-3.8 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, PDS_TONAME_EQ_TOLOCAL_HDRS_LCASE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 8BIT
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jul 2021 08:57:17 -0000

On Mon, 12 Jul 2021, guojiufu wrote:

> On 2021-07-12 14:20, Richard Biener wrote:
> > On Fri, 9 Jul 2021, Segher Boessenkool wrote:
> > 
> >> On Fri, Jul 09, 2021 at 08:43:59AM +0200, Richard Biener wrote:
> >> > I wonder if there's a way to query the target what modes the doloop
> >> > pattern can handle (not being too familiar with the doloop code).
> >> 
> >> You can look what modes are allowed for operand 0 of doloop_end,
> >> perhaps?  Although that is a define_expand, not a define_insn, so it is
> >> hard to introspect.
> >> 
> >> > Why do you need to do any checks besides the new type being able to
> >> > represent all IV values?  The original doloop IV will never wrap
> >> > (OTOH if niter is U*_MAX then we compute niter + 1 which will become
> >> > zero ... I suppose the doloop might still do the correct thing here
> >> > but it also still will with a IV with larger type).
> 
> The issue comes from U*_MAX (original short MAX), as you said: on which
> niter + 1 becomes zero.  And because the step for doloop is -1; then, on
> larger type 'zero - 1' will be a very large number on larger type
> (e.g. 0xff...ff); but on the original short type 'zero - 1' is a small value
> (e.g. "0xff").

But for the larger type the small type MAX + 1 fits and does not yield
zero so it should still work exactly as before, no?  Of course you
have to compute the + 1 in the larger type.

> >> 
> >> doloop_valid_p guarantees it is simple and doesn't wrap.
> >> 
> >> > I'd have expected sth like
> >> >
> >> >    ntype = lang_hooks.types.type_for_mode (word_mode, TYPE_UNSIGNED
> >> > (ntype));
> >> >
> >> > thus the decision made using a mode - which is also why I wonder
> >> > if there's a way to query the target for this.  As you say,
> >> > it _may_ be fast, so better check (somehow).
> 
> 
> I was also thinking of using hooks like type_for_size/type_for_mode.
>     /* Use type in word size may fast.  */
>     if (TYPE_PRECISION (ntype) < BITS_PER_WORD
>         && Wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE
> (ntype))))
>       {
>         ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1);
>         base = fold_convert (ntype, base);
>       }
> 
> As you pointed out, this does not query the mode from targets.
> As Segher pointed out "doloop_end" checks unsupported mode, while it seems
> not easy to use it in tree-ssa-loop-ivopts.c.
> For implementations of doloop_end, tartgets like rs6000/aarch64/ia64 requires
> Pmode/DImode; while there are other targets that work on other 'mode' (e.g.
> SI).
> 
> 
> In doloop_optimize, there is code:
> 
> ```
>     mode = desc->mode;
> .....
>     doloop_reg = gen_reg_rtx (mode);
>     rtx_insn *doloop_seq = targetm.gen_doloop_end (doloop_reg, 
> start_label);
> 
>     word_mode_size = GET_MODE_PRECISION (word_mode);
>     word_mode_max = (HOST_WIDE_INT_1U << (word_mode_size - 1) << 1) - 1;
>     if (! doloop_seq
>         && mode != word_mode
>         /* Before trying mode different from the one in that # of 
> iterations is
>            computed, we must be sure that the number of iterations fits into
>            the new mode.  */
>         && (word_mode_size >= GET_MODE_PRECISION (mode)
>             || wi::leu_p (iterations_max, word_mode_max)))
>       {
>         if (word_mode_size > GET_MODE_PRECISION (mode))
>           count = simplify_gen_unary (ZERO_EXTEND, word_mode, count, mode);
>         else
>           count = lowpart_subreg (word_mode, count, mode);
>         PUT_MODE (doloop_reg, word_mode);
>         doloop_seq = targetm.gen_doloop_end (doloop_reg, start_label);
>       }
>     if (! doloop_seq)
>       {
>         if (dump_file)
>           fprintf (dump_file,
>                    "Doloop: Target unwilling to use doloop pattern!\n");
>         return false;
>       }
> ```
> The above code first tries the mode of niter_desc by call
> targetm.gen_doloop_end
> to see if the target can generate doloop insns, if fail, then try to use
> 'word_mode' against gen_doloop_end.
> 
> 
> >> 
> >> Almost all targets just use Pmode, but there is no such guarantee I
> >> think, and esp. some targets that do not have machine insns for this
> >> (but want to generate different code for this anyway) can do pretty much
> >> anything.
> >> 
> >> Maybe using just Pmode here is good enough though?
> > 
> > I think Pmode is a particularly bad choice and I'd prefer word_mode
> > if we go for any hardcoded mode.  s390x for example seems to handle
> > both SImode and DImode (but names the helper gen_doloop_si64
> > for SImode?!).  But indeed it looks like somehow querying doloop_end
> > is going to be difficult since the expander doesn't have any mode,
> > so we'd have to actually try emit RTL here.
> 
> Instead of using hardcode mode, maybe we could add a hook for targets to
> return
> the preferred mode.

That's a possiblity of course.  Like the following which just shows the
default implementation then (pass in current mode, return a more preferred
mode or the mode itself)

enum machine_mode
prefered_doloop_mode (enum machine_mode mode)
{
  return mode;
}

> 
> Thanks for those valuable comments!
> 
> Jiufu Guo
> 
> 
> 
> > 
> > Richard.
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)