From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=MvOm=NV=suse.de=rguenther@sourceware.org>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130])
	by sourceware.org (Postfix) with ESMTPS id BF453388450F
	for <gcc-patches@gcc.gnu.org>; Wed, 19 Jun 2024 11:54:31 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BF453388450F
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BF453388450F
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.130
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718798074; cv=none;
	b=FZ673tE9nBa21SYMudo4CqEx5V1cHe2O9us3rJ4HrlMxBEwSgkE7m/eseR7odkkR6bxpG3Vxp6fnNPvXBDHg9um1AF1SvTQTYsehOJwuEZmtlNPKZFBhuWtpQtHqRSQmCuUV/Q2iUFGzGMsre4/pwpzN4NVZ8LhkAhUIo3pO4AY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1718798074; c=relaxed/simple;
	bh=99Iy8rnEj/vpn+CB9Q5V8ikUbGBC+Y/U2LnW3x3HfXA=;
	h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date:
	 From:To:Subject:Message-ID:MIME-Version; b=DooZhEBzEVvNtK+FBEImLehCTrIfjwXxLsZHuiBsRMQsktZTJjqi1nK2oWkEdD3WlR0M34oSNPQSrCBYYIM/zMfthZXUoOX0hxsCAGTUjePMqcSU2olgadWm+082oG8WaEecaC+CXopSfEZAPQODz8omF8alCPq3c9O6SlRVyd4=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from murzim.nue2.suse.org (unknown [10.168.4.243])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(No client certificate requested)
	by smtp-out1.suse.de (Postfix) with ESMTPS id AB6DE21A61;
	Wed, 19 Jun 2024 11:54:30 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1718798070; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=eEN9Sv0L6uUqwNT2DFzXVhXZ2knD5n0pXn12dY4G+n8=;
	b=HsMAZM1/p6+pMtL96wKnh+hPnppXsim1axAtScQ4sIp0MyBnIEEsTwlq8SZ9dexS4xW1Am
	OOVkosPFDydq8tHXCVDO/Q6Ei58DVgqV2OB2lRvOgfqJHXIs3jxyXS4Df4jebxwRad0/Nv
	q+l8tvJZhYNfRUVtu9cugp+xLMEW/jI=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1718798070;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=eEN9Sv0L6uUqwNT2DFzXVhXZ2knD5n0pXn12dY4G+n8=;
	b=zpeWNBZqKHPXAuZ2KKze90wNN/aPggU8qviQta6wqbU1xvt9J24h++KMeiCK1gPj3qTFdH
	p2kQG1US0azxvZCg==
Authentication-Results: smtp-out1.suse.de;
	none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1718798070; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=eEN9Sv0L6uUqwNT2DFzXVhXZ2knD5n0pXn12dY4G+n8=;
	b=HsMAZM1/p6+pMtL96wKnh+hPnppXsim1axAtScQ4sIp0MyBnIEEsTwlq8SZ9dexS4xW1Am
	OOVkosPFDydq8tHXCVDO/Q6Ei58DVgqV2OB2lRvOgfqJHXIs3jxyXS4Df4jebxwRad0/Nv
	q+l8tvJZhYNfRUVtu9cugp+xLMEW/jI=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1718798070;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=eEN9Sv0L6uUqwNT2DFzXVhXZ2knD5n0pXn12dY4G+n8=;
	b=zpeWNBZqKHPXAuZ2KKze90wNN/aPggU8qviQta6wqbU1xvt9J24h++KMeiCK1gPj3qTFdH
	p2kQG1US0azxvZCg==
Date: Wed, 19 Jun 2024 13:54:30 +0200 (CEST)
From: Richard Biener <rguenther@suse.de>
To: Tamar Christina <tamar.christina@arm.com>
cc: gcc-patches@gcc.gnu.org, nd@arm.com, bin.cheng@linux.alibaba.com
Subject: Re: [PATCH][ivopts]: use affine_tree when comparing IVs during
 candidate selection [PR114932]
In-Reply-To: <patch-18566-tamar@arm.com>
Message-ID: <22rq444p-s3o0-o4p2-2778-3467oq6rq92o@fhfr.qr>
References: <patch-18566-tamar@arm.com>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="-1475677436-2053736573-1718798070=:4147"
X-Spam-Score: -3.30
X-Spam-Level: 
X-Spamd-Result: default: False [-3.30 / 50.00];
	BAYES_HAM(-3.00)[100.00%];
	CTYPE_MIXED_BOGUS(1.00)[];
	NEURAL_HAM_LONG(-1.00)[-1.000];
	NEURAL_HAM_SHORT(-0.20)[-0.994];
	MIME_GOOD(-0.10)[multipart/mixed,text/plain];
	FROM_HAS_DN(0.00)[];
	TO_DN_SOME(0.00)[];
	MISSING_XM_UA(0.00)[];
	MIME_TRACE(0.00)[0:+,1:+];
	ARC_NA(0.00)[];
	RCVD_COUNT_ZERO(0.00)[0];
	DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519];
	FROM_EQ_ENVFROM(0.00)[];
	RCPT_COUNT_THREE(0.00)[4];
	TO_MATCH_ENVRCPT_ALL(0.00)[];
	FUZZY_BLOCKED(0.00)[rspamd.com];
	DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email]
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIM_INVALID,DKIM_SIGNED,GIT_PATCH_0,KAM_DMARC_STATUS,KAM_LOTSOFHASH,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

---1475677436-2053736573-1718798070=:4147
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 8BIT

On Fri, 14 Jun 2024, Tamar Christina wrote:

> Hi All,
> 
> IVOPTS normally uses affine trees to perform comparisons between different IVs,
> but these seem to have been missing in two key spots and instead normal tree
> equivalencies used.
> 
> In some cases where we have a structural equivalence but not a signedness
> equivalencies we end up generating both a signed and unsigned IV for the same
> candidate.
> 
> This happens quite a lot with fortran but can also happen in C because this came
> code is unable to figure out when one expression is a multiple of another.
> 
> As an example in the attached testcase we get:
> 
> Initial set of candidates:
>   cost: 24 (complexity 3)
>   reg_cost: 9
>   cand_cost: 15
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6, 8
>    group:0 --> iv_cand:6, cost=(0,1)
>    group:1 --> iv_cand:1, cost=(0,0)
>    group:2 --> iv_cand:8, cost=(0,1)
>    group:3 --> iv_cand:8, cost=(0,1)
>   invariant variables: 6
>   invariant expressions: 1, 2
> 
> <Invariant Expressions>:
> inv_expr 1:     stride.3_27 * 4
> inv_expr 2:     (unsigned long) stride.3_27 * 4
> 
> These end up being used in the same group:
> 
> Group 1:
> cand  cost    compl.  inv.expr.       inv.vars
> 1     0       0       NIL;    6
> 2     0       0       NIL;    6
> 3     0       0       NIL;    6
> 
> which ends up with IV opts picking the signed and unsigned IVs:
> 
> Improved to:
>   cost: 24 (complexity 3)
>   reg_cost: 9
>   cand_cost: 15
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6, 8
>    group:0 --> iv_cand:6, cost=(0,1)
>    group:1 --> iv_cand:1, cost=(0,0)
>    group:2 --> iv_cand:8, cost=(0,1)
>    group:3 --> iv_cand:8, cost=(0,1)
>   invariant variables: 6
>   invariant expressions: 1, 2
> 
> and so generates the same IV as both signed and unsigned:
> 
> ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq 58.2545), maybe hot
> ;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080) (FALLTHRU,EXECUTABLE)
> ;;                25 [always]  count:191126046 (estimated locally, freq 51.8465) (FALLTHRU,DFS_BACK,EXECUTABLE)
>   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
>   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
>   # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
>   # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
> 
> ...
> 
> ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq 58.2545), maybe hot
> ;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> ;;    pred:       22 [always]  count:95443719 (estimated locally, freq 25.8909) (FALLTHRU)                                                                                                                                                                                   ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, freq 19.4182) (TRUE_VALUE,EXECUTABLE)                                                                                                                                                             ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, freq 12.9455) (TRUE_VALUE,EXECUTABLE)                                                                                                                                                               # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>                                                                                                                                    
                                                                                   ivtmp.22_82 = ivtmp.22_41 + 1;                                                                                                                                                                                                                                               ivtmp.26_72 = ivtmp.26_51 + _80;                                                                                                                                                                                                                                             ivtmp.28_98 = ivtmp.28_90 + _39;  
> 
> These two IVs are always used as unsigned, so IV ops generates:
> 
>   _73 = stride.3_27 * 4;
>   _80 = (unsigned long) _73;
>   _54 = (unsigned long) stride.3_27;
>   _39 = _54 * 4;
> 
> Which means that in e.g. exchange2 we generate a lot of duplicate code.
> 
> This is because candidate 6 and 8 are structurally equivalent but have different
> signs.
> 
> This patch changes it so that if you have two IVs that are affine equivalent to
> just pick one over the other.  IV already has code for this, so the patch just
> uses affine trees instead of tree for the check.
> 
> With it we get:
> 
> <Invariant Expressions>:
> inv_expr 1:     stride.3_27 * 4
> 
> <Group-candidate Costs>:
> Group 0:
>   cand  cost    compl.  inv.expr.       inv.vars
>   5     0       2       NIL;    NIL;
>   6     0       3       NIL;    NIL;
> 
> Group 1:
>   cand  cost    compl.  inv.expr.       inv.vars
>   1     0       0       NIL;    6
>   2     0       0       NIL;    6
>   3     0       0       NIL;    6
>   4     0       0       NIL;    6
> 
> Initial set of candidates:
>   cost: 16 (complexity 3)
>   reg_cost: 6
>   cand_cost: 10
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6
>    group:0 --> iv_cand:6, cost=(0,3)
>    group:1 --> iv_cand:1, cost=(0,0)
>   invariant variables: 6
>   invariant expressions: 1
> 
> The two patches together results in a 10% performance increase in exchange2 in
> SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in compile
> time. There's also a 5% performance improvement in fotonik3d and similar
> reduction in binary size.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/114932
> 	* tree-affine.cc (aff_combination_constant_multiple_p): Take zero
> 	offsets into account.
> 	* tree-ssa-loop-ivopts.cc (affine_compare_eq): New.
> 	(record_group_use): Use it.
> 	(constant_multiple_of): Also check equality under
> 	aff_combination_constant_multiple_p.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/114932
> 	* gfortran.dg/addressing-modes_2.f90: New test.
> 
> ---
> diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> new file mode 100644
> index 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> @@ -0,0 +1,20 @@
> +! { dg-do compile { target aarch64-*-* } }
> +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> +
> +module a
> +integer, parameter :: b=3, c=b
> +contains
> +subroutine d(block)
> +integer f, col   , block(:, :, :), e
> +do f = 1, c
> +   do col = 1, c
> +             block(:f,                          :, e()) = do
> +     end do
> +  end do
> +  end
> +  end
> +
> +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts } }
> +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 ivopts } }
> +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 ivopts } }
> +
> diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
> index d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb286346890ddeab4096 100644
> --- a/gcc/tree-affine.cc
> +++ b/gcc/tree-affine.cc
> @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree *val, aff_tree *div,
>  				     &mult_set, mult))
>      return false;
>  
> +  /* Everything is a multiple of 0, which means we shoudn't enforce that
> +     mult_set is checked, since that forced the only valid multiple of
> +     val and div to be 0 whereas 1 is also possible.  */
> +  if (known_eq (val->offset, 0)
> +      && known_eq (div->offset, 0))
> +    mult_set = false;
> +

In fact all numbers are possible?  Shouldn't this be better handled
in wide_int_constant_multiple_p by special-casing
known_eq (div, 0) in the known_eq (val, 0) condition by simply
returning 'true' without checking or setting *mult_set?

The docs of wide_int_constant_multiple_p is odd:

/* If VAL != CST * DIV for any constant CST, returns false.

should that be 'If VAL == CST * DIV for no constant CST, returns false.'?
Or s/any/all/?

>    for (i = 0; i < div->n; i++)
>      {
>        class aff_comb_elt *elt
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a648764cd2facd9ddb4914 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop)
>    return exit;
>  }
>  
> +/* Compares the given affine tree LEFT with the tree expression RIGHT and return
> +   whether they are the same under affine equality.  */
> +
> +static bool
> +affine_compare_eq (aff_tree &left, tree right)
> +{
> +  aff_tree aff_right;
> +  tree_to_aff_combination (right, TREE_TYPE (right), &aff_right);
> +  aff_combination_scale (&aff_right, -1);
> +  aff_combination_add (&aff_right, &left);
> +  return aff_combination_zero_p (&aff_right);
> +}
> +
>  
>  /* Given a nested expression in ARG consisting of PLUS or MULT try to see if one
>     of the arguments of each expression is a constant and that the type of the
> @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree *use_p,
>    tree addr_base = NULL;
>    struct iv_group *group = NULL;
>    poly_uint64 addr_offset = 0;
> +  aff_tree iv_step, iv_addr_base;
> +
> +  tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step);
>  
>    /* Record non address type use in a new group.  */
>    if (address_p (type))
> @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree *use_p,
>        tree addr_toffset;
>        split_constant_offset (iv->base, &addr_base, &addr_toffset);
>        addr_offset = int_cst_value (addr_toffset);
> +      tree_to_aff_combination (addr_base, TREE_TYPE (addr_base), &iv_addr_base);
>        for (i = 0; i < data->vgroups.length (); i++)
>  	{
>  	  struct iv_use *use;
> @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree *use_p,
>  
>  	  /* Check if it has the same stripped base and step.  */
>  	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> -	      && operand_equal_p (iv->step, use->iv->step, 0)
> -	      && operand_equal_p (addr_base, use->addr_base, 0))
> +	      && affine_compare_eq (iv_step, use->iv->step)
> +	      && affine_compare_eq (iv_addr_base, use->addr_base))

There's only this use of addr_base so I think the opportunity is to
turn iv_use->addr_base into aff_tree (even though that's a quite big
representation).

For the testcase, what are the two IVs we are comparing?  I wonder
why you need the affine compare for iv->step?

>  	    break;
>  	}
>        if (i == data->vgroups.length ())
> @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, widest_int *mul)
>        return true;
>      }
>  
> +  aff_tree aff_top, aff_bot;
> +  tree_to_aff_combination (top, TREE_TYPE (top), &aff_top);
> +  tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot);
> +  poly_widest_int poly_mul;
> +  if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul)
> +      && poly_mul.is_constant (mul))
> +    return true;
> +

So why does stripping nops not work here?

>    code = TREE_CODE (top);
>    switch (code)
>      {
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
---1475677436-2053736573-1718798070=:4147--