From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by sourceware.org (Postfix) with ESMTP id 7D934383F86C for ; Thu, 27 Aug 2020 13:20:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 7D934383F86C Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-96-B-VDj47SMH2380yUrbnwMA-1; Thu, 27 Aug 2020 09:20:25 -0400 X-MC-Unique: B-VDj47SMH2380yUrbnwMA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 07B15640A0; Thu, 27 Aug 2020 13:20:24 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-113-115.ams2.redhat.com [10.36.113.115]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 951A580480; Thu, 27 Aug 2020 13:20:23 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id 07RDKKPm032655; Thu, 27 Aug 2020 15:20:21 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id 07RDKKAS032654; Thu, 27 Aug 2020 15:20:20 +0200 Date: Thu, 27 Aug 2020 15:20:19 +0200 From: Jakub Jelinek To: Richard Biener Cc: Hongtao Liu , GCC Patches Subject: Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512 Message-ID: <20200827132019.GO2961@tucnak> Reply-To: Jakub Jelinek References: <20200827122452.GN2961@tucnak> MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/1.11.3 (2019-02-01) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Mimecast-Spam-Score: 0.002 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Aug 2020 13:20:28 -0000 On Thu, Aug 27, 2020 at 03:07:59PM +0200, Richard Biener wrote: > > Also, isn't the pass also useful for TARGET_AVX and above (but in that case > > only if it is a simple memory load)? Or are avx/avx2 broadcast slower than > > full vector loads? > > > > As Jeff wrote, I wonder if when successfully replacing those pool constants > > the old constant pool entries will be omitted. > > > > Another thing I wonder about is whether more analysis shouldn't be used. > > E.g. if the constant pool entry is already emitted into .rodata anyway > > (e.g. some earlier function needed it), using the broadcast will mean > > actually larger .rodata. If {1to8} and similar is as fast as reading all > > the same elements from memory (or faster), perhaps in that case it should > > broadcast from the first element of the existing constant pool full vector > > rather than creating a new one. > > And similarly, perhaps the function should look at all constant pool entries > > in the current function (not yet emitted into .rodata) and if it would > > succeed for some and not for others, either use broadcast from its first > > element or not perform it for the others too. > > IIRC I once implemented this (re-using vector constant components > for non-vector pool entries) but it was quite hackish and never merged > it seems. If the generic constant pool code could do it, it would of course simplify this pass. Not sure if the case where earlier function emits already some smaller constant and later function needs a CONST_VECTOR containing that can be handled at all (probably not), but if the same function has both scalar pool entries and CONST_VECTOR ones that contain those, or already emitted CONST_VECTOR pool entry has them, it shouldn't be that hard, at least for targets with symbol aliases, e.g. by using .LC33 = .LC24 or .LC34 = .LC24 + 8 where .LC33 or .LC34 would be the scalar pool entry label and .LC24 CONST_VECTOR containing those. Seems constant pool marking is performed during mark_constant_pool called during final from assemble_start_function or assemble_end_function, so if the pass replaces the constants before final and the constants are unused, they won't be emitted. Jakub