From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id CF027394743D for ; Fri, 28 Aug 2020 16:07:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org CF027394743D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=richard.sandiford@arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 617391FB; Fri, 28 Aug 2020 09:07:13 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C66893F71F; Fri, 28 Aug 2020 09:07:12 -0700 (PDT) From: Richard Sandiford To: Jakub Jelinek via Gcc-patches Mail-Followup-To: Jakub Jelinek via Gcc-patches , Richard Biener , Jakub Jelinek , richard.sandiford@arm.com Cc: Richard Biener , Jakub Jelinek Subject: Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512 References: <20200827122452.GN2961@tucnak> <20200827132019.GO2961@tucnak> <20200828085246.GQ2961@tucnak> <20200828104705.GR2961@tucnak> <20200828145325.GA18149@tucnak> Date: Fri, 28 Aug 2020 17:07:11 +0100 In-Reply-To: <20200828145325.GA18149@tucnak> (Jakub Jelinek via Gcc-patches's message of "Fri, 28 Aug 2020 16:53:25 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_NUMSUBJECT, KAM_STOCKGEN, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Aug 2020 16:07:15 -0000 Thanks for doing this. I don't feel qualified to review the full patch, but one thing: Jakub Jelinek via Gcc-patches writes: > + auto_vec buffer; > + auto_vec vec; > + object_allocator > + data_pool ("constant_descriptor_rtx_data_pool"); > + int idx = 0; > + size_t size = 0; > + for (constant_descriptor_rtx *desc = pool->first; desc; desc = desc->next) > + if (desc->mark > 0 > + && ! (SYMBOL_REF_HAS_BLOCK_INFO_P (desc->sym) > + && SYMBOL_REF_BLOCK (desc->sym))) > + { > + buffer.truncate (0); 128 isn't big enough for all targets (e.g. aarch64 with -msve-vector-bits=2048), so I think we still need a reserve call here. Thanks, Richard > + if (native_encode_rtx (desc->mode, desc->constant, buffer, 0, > + GET_MODE_SIZE (desc->mode))) > + { > + constant_descriptor_rtx_data *data = data_pool.allocate (); > + data->desc = desc; > + data->bytes = NULL; > + data->size = GET_MODE_SIZE (desc->mode); > + data->offset = 0; > + data->hash = idx++; > + size += data->size; > + vec.safe_push (data); > + } > + } > + if (idx) > + { > + vec.qsort (constant_descriptor_rtx_data_cmp); > + unsigned min_size = vec.last ()->size; > + target_unit *bytes = XNEWVEC (target_unit, size); > + unsigned int i; > + constant_descriptor_rtx_data *data; > + hash_table * htab > + = new hash_table (31); > + size = 0; > + FOR_EACH_VEC_ELT (vec, i, data) > + { > + buffer.truncate (0); > + native_encode_rtx (data->desc->mode, data->desc->constant, > + buffer, 0, data->size); > + memcpy (bytes + size, buffer.address (), data->size); > + data->bytes = bytes + size; > + data->hash = iterative_hash (data->bytes, > + data->size * sizeof (target_unit), 0); > + size += data->size; > + constant_descriptor_rtx_data **slot > + = htab->find_slot_with_hash (data, data->hash, INSERT); > + if (*slot) > + { > + data->desc->mark = ~(*slot)->desc->labelno; > + data->desc->offset = (*slot)->offset; > + } > + else > + { > + unsigned int sz = 1 << floor_log2 (data->size); > + > + *slot = data; > + for (sz >>= 1; sz >= min_size; sz >>= 1) > + for (unsigned off = 0; off + sz <= data->size; off += sz) > + { > + constant_descriptor_rtx_data tmp; > + tmp.desc = data->desc; > + tmp.bytes = data->bytes + off; > + tmp.size = sz; > + tmp.offset = off; > + tmp.hash = iterative_hash (tmp.bytes, > + sz * sizeof (target_unit), 0); > + slot = htab->find_slot_with_hash (&tmp, tmp.hash, INSERT); > + if (*slot == NULL) > + { > + *slot = data_pool.allocate (); > + **slot = tmp; > + } > + } > + } > + } > + delete htab; > + XDELETE (bytes); > + } > + data_pool.release (); > +} > + > /* Mark all constants that are used in the current function, then write > out the function's private constant pool. */ > > @@ -4251,6 +4425,10 @@ output_constant_pool (const char *fnname > void > output_shared_constant_pool (void) > { > + if (optimize > + && TARGET_SUPPORTS_ALIASES) > + optimize_constant_pool (shared_constant_pool); > + > output_constant_pool_contents (shared_constant_pool); > } > > > > Jakub