From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-435353-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 28669 invoked by alias); 6 Sep 2016 15:33:00 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 28653 invoked by uid 89); 6 Sep 2016 15:32:59 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.3 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=comfortable, ended, timode, tkachov
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 06 Sep 2016 15:32:57 +0000
Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24])	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id DDFA94E4D6;	Tue,  6 Sep 2016 15:32:55 +0000 (UTC)
Received: from tucnak.zalov.cz (ovpn-204-43.brq.redhat.com [10.40.204.43])	by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u86FWsaD020277	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);	Tue, 6 Sep 2016 11:32:55 -0400
Received: from tucnak.zalov.cz (localhost [127.0.0.1])	by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id u86FWqXr003104;	Tue, 6 Sep 2016 17:32:52 +0200
Received: (from jakub@localhost)	by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id u86FWoLh003103;	Tue, 6 Sep 2016 17:32:50 +0200
Date: Tue, 06 Sep 2016 15:33:00 -0000
From: Jakub Jelinek <jakub@redhat.com>
To: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>, Richard Biener <rguenther@suse.de>
Subject: Re: [PATCH][v3] GIMPLE store merging pass
Message-ID: <20160906153250.GK14857@tucnak.redhat.com>
Reply-To: Jakub Jelinek <jakub@redhat.com>
References: <57CEDD67.6010801@foss.arm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <57CEDD67.6010801@foss.arm.com>
User-Agent: Mutt/1.5.24 (2015-08-30)
X-IsSubscribed: yes
X-SW-Source: 2016-09/txt/msg00316.txt.bz2

On Tue, Sep 06, 2016 at 04:14:47PM +0100, Kyrill Tkachov wrote:
> The v3 of this patch addresses feedback I received on the version posted at [1].
> The merged store buffer is now represented as a char array that we splat values onto with
> native_encode_expr and native_interpret_expr. This allows us to merge anything that native_encode_expr
> accepts, including floating point values and short vectors. So this version extends the functionality
> of the previous one in that it handles floating point values as well.
> 
> The first phase of the algorithm that detects the contiguous stores is also slightly refactored according
> to feedback to read more fluently.
> 
> Richi, I experimented with merging up to MOVE_MAX bytes rather than word size but I got worse results on aarch64.
> MOVE_MAX there is 16 (because it has load/store register pair instructions) but the 128-bit immediates that we ended
> synthesising were too complex. Perhaps the TImode immediate store RTL expansions could be improved, but for now
> I've left the maximum merge size to be BITS_PER_WORD.

At least from playing with this kind of things in the RTL PR22141 patch,
I remember storing 64-bit constants on x86_64 compared to storing 2 32-bit
constants usually isn't a win (not just for speed optimized blocks but also for
-Os).  For 64-bit store if the constant isn't signed 32-bit or unsigned
32-bit you need movabsq into some temporary register which has like 3 times worse
latency than normal store if I remember well, and then store it.  If it can
be CSEd and the same constant used multiple times in adjacent code perhaps.
Various other targets have different costs for different constants,
so it would be nice if the pass considered that (computed RTX costs of those
constants and used that in some heuristics).
What alias set is used for the accesses if there are different alias sets
involved in between the merged stores?
Also alignment can matter, even on non-strict alignment targets (speed vs.
-Os for that).
And, do you have some SPEC2k and/or SPEC2k6 numbers, for
 e.g. x86_64/i686/arm/aarch64/powerpc64le?
The RTL PR22141 changes weren't added mainly because it slowed down SPEC2k*
on powerpc.
Also, do you only handle constants or also the case where there is partial
or complete copying from some other memory, where it could be turned into
larger chunk loads + stores or __builtin_memcpy?

> I've disabled the pass for PDP-endian targets as the merging code proved to be quite fiddly to get right for different
> endiannesses and I didn't feel comfortable writing logic for BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN targets without serious
> testing capabilities. I hope that's ok (I note the bswap pass also doesn't try to do anything on such targets).

I think that is fine, it isn't the only pass that punts in this case.

	Jakub