From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-1.mimecast.com (us-smtp-2.mimecast.com [207.211.31.81]) by sourceware.org (Postfix) with ESMTP id 5DBB43857C62 for ; Fri, 28 Aug 2020 14:53:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 5DBB43857C62 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-1-Oe7ULJI0OOiZeI9dXzRYkw-1; Fri, 28 Aug 2020 10:53:30 -0400 X-MC-Unique: Oe7ULJI0OOiZeI9dXzRYkw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A8B94801AAB; Fri, 28 Aug 2020 14:53:29 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-113-115.ams2.redhat.com [10.36.113.115]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 3D5335DE46; Fri, 28 Aug 2020 14:53:29 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id 07SErQ1a018257; Fri, 28 Aug 2020 16:53:26 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id 07SErPjL018256; Fri, 28 Aug 2020 16:53:25 +0200 Date: Fri, 28 Aug 2020 16:53:25 +0200 From: Jakub Jelinek To: Richard Biener Cc: GCC Patches Subject: Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512 Message-ID: <20200828145325.GA18149@tucnak> Reply-To: Jakub Jelinek References: <20200827122452.GN2961@tucnak> <20200827132019.GO2961@tucnak> <20200828085246.GQ2961@tucnak> <20200828104705.GR2961@tucnak> MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/1.11.3 (2019-02-01) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Mimecast-Spam-Score: 0.002 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=1.0 required=5.0 tests=BAYES_00, DKIM_INVALID, DKIM_SIGNED, KAM_DMARC_STATUS, KAM_NUMSUBJECT, KAM_STOCKGEN, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Aug 2020 14:53:34 -0000 On Fri, Aug 28, 2020 at 01:06:40PM +0200, Richard Biener via Gcc-patches wrote: > > I don't see why it would break, it will not optimize { -1LL, -1LL } > > vs. -1 scalar, sure, but it uses the hash and equality function the > > rtl constant pool uses, which means it compares both the constants > > (rtx_equal_p) and mode we have recorded for it. > > Oh, I thought my patch for PR54201 was installed but it was not > (I grepped my patch folder...). PR54201 complains about something > similar but then different ;) Guess post-processing that would also be > possible but also a bit awkward. Maybe we should hash the > full byte representation instead of the components. Here is an adjusted patch that ought to merge even the same sized different mode vectors with the same byte representation, etc. It won't really help with avoiding the multiple reads of the constant in the same function, but as you found, your patch doesn't help with that either. Your patch isn't really incompatible with what the patch below does, though I wonder whether a) it wouldn't be better to always canonicalize to an integral mode with as few elts as possible even e.g. for floats b) whether asserting that it simplify_rtx succeeds is safe, whether it shouldn't just canonicalize if the canonicalization works and just do what it previously did otherwise. The following patch puts all pool entries which can be natively encoded into a vector, sorts it by decreasing size, determines minimum size of a pool entry and adds hash elts for each (aligned) min_size or wider power of two-ish portion of the pool constant in addition to the whole pool constant byte representation. 2020-08-28 Jakub Jelinek PR middle-end/54201 * varasm.c: Include alloc-pool.h. (output_constant_pool_contents): Emit desc->mark < 0 entries as aliases. (struct constant_descriptor_rtx_data): New type. (constant_descriptor_rtx_data_cmp): New function. (struct const_rtx_data_hasher): New type. (const_rtx_data_hasher::hash, const_rtx_data_hasher::equal): New methods. (optimize_constant_pool): New function. (output_shared_constant_pool): Call it if TARGET_SUPPORTS_ALIASES. --- gcc/varasm.c.jj 2020-07-28 15:39:10.091755086 +0200 +++ gcc/varasm.c 2020-08-28 15:37:30.605076961 +0200 @@ -57,6 +57,7 @@ along with GCC; see the file COPYING3. #include "asan.h" #include "rtl-iter.h" #include "file-prefix-map.h" /* remap_debug_filename() */ +#include "alloc-pool.h" #ifdef XCOFF_DEBUGGING_INFO #include "xcoffout.h" /* Needed for external data declarations. */ @@ -4198,7 +4199,27 @@ output_constant_pool_contents (struct rt class constant_descriptor_rtx *desc; for (desc = pool->first; desc ; desc = desc->next) - if (desc->mark) + if (desc->mark < 0) + { +#ifdef ASM_OUTPUT_DEF + const char *name = targetm.strip_name_encoding (XSTR (desc->sym, 0)); + char label[256]; + char buffer[256 + 32]; + const char *p; + + ASM_GENERATE_INTERNAL_LABEL (label, "LC", ~desc->mark); + p = targetm.strip_name_encoding (label); + if (desc->offset) + { + sprintf (buffer, "%s+%ld", p, desc->offset); + p = buffer; + } + ASM_OUTPUT_DEF (asm_out_file, name, p); +#else + gcc_unreachable (); +#endif + } + else if (desc->mark) { /* If the constant is part of an object_block, make sure that the constant has been positioned within its block, but do not @@ -4216,6 +4237,159 @@ output_constant_pool_contents (struct rt } } +struct constant_descriptor_rtx_data { + constant_descriptor_rtx *desc; + target_unit *bytes; + unsigned short size; + unsigned short offset; + unsigned int hash; +}; + +/* qsort callback to sort constant_descriptor_rtx_data * vector by + decreasing size. */ + +static int +constant_descriptor_rtx_data_cmp (const void *p1, const void *p2) +{ + constant_descriptor_rtx_data *const data1 + = *(constant_descriptor_rtx_data * const *) p1; + constant_descriptor_rtx_data *const data2 + = *(constant_descriptor_rtx_data * const *) p2; + if (data1->size > data2->size) + return -1; + if (data1->size < data2->size) + return 1; + if (data1->hash < data2->hash) + return -1; + gcc_assert (data1->hash > data2->hash); + return 1; +} + +struct const_rtx_data_hasher : nofree_ptr_hash +{ + static hashval_t hash (constant_descriptor_rtx_data *); + static bool equal (constant_descriptor_rtx_data *, + constant_descriptor_rtx_data *); +}; + +/* Hash and compare functions for const_rtx_data_htab. */ + +hashval_t +const_rtx_data_hasher::hash (constant_descriptor_rtx_data *data) +{ + return data->hash; +} + +bool +const_rtx_data_hasher::equal (constant_descriptor_rtx_data *x, + constant_descriptor_rtx_data *y) +{ + if (x->hash != y->hash || x->size != y->size) + return 0; + unsigned int align1 = x->desc->align; + unsigned int align2 = y->desc->align; + unsigned int offset1 = (x->offset * BITS_PER_UNIT) & (align1 - 1); + unsigned int offset2 = (y->offset * BITS_PER_UNIT) & (align2 - 1); + if (offset1) + align1 = least_bit_hwi (offset1); + if (offset2) + align2 = least_bit_hwi (offset2); + if (align2 > align1) + return 0; + if (memcmp (x->bytes, y->bytes, x->size * sizeof (target_unit)) != 0) + return 0; + return 1; +} + +/* Attempt to optimize constant pool POOL. If it contains both CONST_VECTOR + constants and scalar constants with the values of CONST_VECTOR elements, + try to alias the scalar constants with the CONST_VECTOR elements. */ + +static void +optimize_constant_pool (struct rtx_constant_pool *pool) +{ + auto_vec buffer; + auto_vec vec; + object_allocator + data_pool ("constant_descriptor_rtx_data_pool"); + int idx = 0; + size_t size = 0; + for (constant_descriptor_rtx *desc = pool->first; desc; desc = desc->next) + if (desc->mark > 0 + && ! (SYMBOL_REF_HAS_BLOCK_INFO_P (desc->sym) + && SYMBOL_REF_BLOCK (desc->sym))) + { + buffer.truncate (0); + if (native_encode_rtx (desc->mode, desc->constant, buffer, 0, + GET_MODE_SIZE (desc->mode))) + { + constant_descriptor_rtx_data *data = data_pool.allocate (); + data->desc = desc; + data->bytes = NULL; + data->size = GET_MODE_SIZE (desc->mode); + data->offset = 0; + data->hash = idx++; + size += data->size; + vec.safe_push (data); + } + } + if (idx) + { + vec.qsort (constant_descriptor_rtx_data_cmp); + unsigned min_size = vec.last ()->size; + target_unit *bytes = XNEWVEC (target_unit, size); + unsigned int i; + constant_descriptor_rtx_data *data; + hash_table * htab + = new hash_table (31); + size = 0; + FOR_EACH_VEC_ELT (vec, i, data) + { + buffer.truncate (0); + native_encode_rtx (data->desc->mode, data->desc->constant, + buffer, 0, data->size); + memcpy (bytes + size, buffer.address (), data->size); + data->bytes = bytes + size; + data->hash = iterative_hash (data->bytes, + data->size * sizeof (target_unit), 0); + size += data->size; + constant_descriptor_rtx_data **slot + = htab->find_slot_with_hash (data, data->hash, INSERT); + if (*slot) + { + data->desc->mark = ~(*slot)->desc->labelno; + data->desc->offset = (*slot)->offset; + } + else + { + unsigned int sz = 1 << floor_log2 (data->size); + + *slot = data; + for (sz >>= 1; sz >= min_size; sz >>= 1) + for (unsigned off = 0; off + sz <= data->size; off += sz) + { + constant_descriptor_rtx_data tmp; + tmp.desc = data->desc; + tmp.bytes = data->bytes + off; + tmp.size = sz; + tmp.offset = off; + tmp.hash = iterative_hash (tmp.bytes, + sz * sizeof (target_unit), 0); + slot = htab->find_slot_with_hash (&tmp, tmp.hash, INSERT); + if (*slot == NULL) + { + *slot = data_pool.allocate (); + **slot = tmp; + } + } + } + } + delete htab; + XDELETE (bytes); + } + data_pool.release (); +} + /* Mark all constants that are used in the current function, then write out the function's private constant pool. */ @@ -4251,6 +4425,10 @@ output_constant_pool (const char *fnname void output_shared_constant_pool (void) { + if (optimize + && TARGET_SUPPORTS_ALIASES) + optimize_constant_pool (shared_constant_pool); + output_constant_pool_contents (shared_constant_pool); } Jakub