From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=nr9m=63=kernel.crashing.org=segher@sourceware.org>
Received: from gate.crashing.org (gate.crashing.org [63.228.1.57])
	by sourceware.org (Postfix) with ESMTP id F2B2E3858C83
	for <gcc-patches@gcc.gnu.org>; Fri,  3 Mar 2023 17:20:30 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F2B2E3858C83
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.crashing.org
Received: from gate.crashing.org (localhost.localdomain [127.0.0.1])
	by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 323HJUHO014096;
	Fri, 3 Mar 2023 11:19:30 -0600
Received: (from segher@localhost)
	by gate.crashing.org (8.14.1/8.14.1/Submit) id 323HJTl6014095;
	Fri, 3 Mar 2023 11:19:29 -0600
X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f
Date: Fri, 3 Mar 2023 11:19:28 -0600
From: Segher Boessenkool <segher@kernel.crashing.org>
To: Surya Kumari Jangala <jskumari@linux.vnet.ibm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
        Peter Bergner <bergner@linux.ibm.com>, meissner@linux.ibm.com
Subject: Re: [PATCH] swap: Fix incorrect lane extraction by vec_extract() [PR106770]
Message-ID: <20230303171928.GH25951@gate.crashing.org>
References: <b6e8ab52-3c92-3fa0-c70a-085c5c53e18e@linux.vnet.ibm.com> <20230227162806.GY25951@gate.crashing.org> <8da7e55a-7403-83a3-4ea5-c5e717ec1ae5@linux.vnet.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <8da7e55a-7403-83a3-4ea5-c5e717ec1ae5@linux.vnet.ibm.com>
User-Agent: Mutt/1.4.2.3i
X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Hi!

On Fri, Mar 03, 2023 at 04:29:57PM +0530, Surya Kumari Jangala wrote:
> On 27/02/23 9:58 pm, Segher Boessenkool wrote:
> > On Wed, Jan 04, 2023 at 01:58:19PM +0530, Surya Kumari Jangala wrote:
> >> +     register swaps of permuting loads/stores have been removed.  */
> > 
> > If it really means only exactly this, then the name isn't so good.
> 
> There is another bit in this class named "web_not_optimizable". This stands for webs that cannot be optimized. I am reusing this name for the new bit I am adding, and I have named this bit as "web_is_optimized".

"optimizable" and "optimized" are very different things.  Both are not
saying much at all, making this worse :-(

"swaps_can_be_removed" and "swaps_have_been_removed" maybe, or
"swaps_will_be_removed" which is closer to the truth it seems?  The
future tense in that is important.

> >> +     Note that special handling should be done only for those 
> >> +     swappable insns that are present in webs optimized above.  */
> >> +  for (i = 0; i < e; ++i)
> >> +    if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
> >> +        !(insn_entry[i].special_handling == SH_NOSWAP_LD || 
> >> +          insn_entry[i].special_handling == SH_NOSWAP_ST))
> >>        {
> >>  	swap_web_entry* root_entry
> >>  	  = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());
> >> -	if (!root_entry->web_not_optimizable)
> >> +	if (root_entry->web_is_optimized)
> >>  	  handle_special_swappables (insn_entry, i);
> >>        }
> > 
> > Why this change?
> 
> The swap pass analyzes vector computations and removes unnecessary doubleword swaps (xxswapdi instructions). The swap pass first constructs webs and removes swap instructions if possible. If the web contains operations that are sensitive to element order (ie, insns requiring special handling), such as an extract, then such instructions should be modified. For example, for an extract operation the lane is changed.

[Please limit your line lengths to something reasonable :-)  It is hard
to edit and even read like this.  70 or 72 is a usual max length in
email, allowing a few levels of quoting before wrapping on standard
terminals.]

> However, the swap pass is changing element order of an extract operation that is present in unoptimized webs. Unoptimized webs are those for which register swaps were not removed, one of the reasons being that there are no loads/stores present in this web. For such webs, element order of extract operation should not be changed.
> Hence, we first mark webs that can be optimized, and only for such webs we call the routine handle_special_swappables() to modify operations sensitive to element order.
> One type of insns that are handled by handle_special_swappables() are non-permuting loads/stores. These insns are converted to permuting loads/stores. This conversion is needed because a permuting load/store is converted into a simple load/store during the final assembly code generation. Whereas, non-permuting loads/stores are converted into a load/store and a swap instruction.
> So we first go thru all the webs, and for permuting loads/stores we find the associated register swaps and mark them for removal. And for non-permuting loads/stores, we convert them to permuting loads/stores. All such webs are marked as 'optimized'
> Then we go thru all the webs again, and for those marked 'optimized', we call handle_special_swappables() to handle insns that require special handling.

Hrm.  So something like "optimizable_with_special_handling"?

That is still way more vague than wanted, of course.

Two big issues:

1) The code seems to duplicate a lot of existing code.  Can things be
factored better?  This is very good for maintainability.
2) Related a bit, please create a new function if you need to add a
large amount of code.

Restructuring existing code is fine, that is even needed to keep code in
good shape.  Please do that in a separate patch (typically before the
rest of the series), which makes reviewing it much simpler: one patch
that can even be largish but that really changes nothing, and one
hopefully small one that does the real meat.  The former is easier to
review because there are only two things to consider: is the new
structure good, and does the patch really not change things?  The latter
patch will be much easier to review as well, because it has ultra focus
on the actual new code :-)

Looking forward to what you come up with,


Segher