From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 013753858D3C for ; Thu, 15 Jun 2023 09:52:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 013753858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id CD6CE223E8; Thu, 15 Jun 2023 09:52:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1686822733; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ubQ+stY7f9VL9+/mFOyr4mXrQdDU6+Hhems0kqRL8FQ=; b=t1BDqO0kOSwMAnVaAn4h3pv+/TvnSxuC8LUVTl3h+Yz4lZ3zt1NKmfAXHoVYuXrJX4uGmH XfJEMRsocYSfBgjDnq4eV9bvTMwGzy+CJr4re/ocI+Z7wx9spOwUcOd4p7q0rlyoLe2cGq NlC1Zf/RCbZOw6mNpUf8e18XRUzvjt4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1686822733; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ubQ+stY7f9VL9+/mFOyr4mXrQdDU6+Hhems0kqRL8FQ=; b=QhOmL24vls0nSTjVuuqilIfc1aalVUFkCmG/7XHKEbxuI2emI3hMIkQUqMM+b7tTPpL/qi g9DE+TSweCgJ41Bw== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 803822C141; Thu, 15 Jun 2023 09:52:13 +0000 (UTC) Date: Thu, 15 Jun 2023 09:52:13 +0000 (UTC) From: Richard Biener To: Robin Dapp cc: "juzhe.zhong@rivai.ai" , gcc-patches , "richard.sandiford" , krebbel , uweigand , linkw@linux.ibm.com Subject: Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow control for length loop control In-Reply-To: <93123844-a2a8-90ab-d47a-6f859fb3a8d6@gmail.com> Message-ID: References: <20230612041438.272885-1-juzhe.zhong@rivai.ai> <1A27F1FE56A91998+2023061516473052116877@rivai.ai> <7e31f226-476a-75d1-6a0f-13fa9134e311@gmail.com> <93123844-a2a8-90ab-d47a-6f859fb3a8d6@gmail.com> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, 15 Jun 2023, Robin Dapp wrote: > > Meh, PoP is now behind a paywall, trying to get through ... I wonder > > if there's a nice online html documenting the s390 len_load/store > > instructions to better understand the need for the bias. > > https://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf > > Look for vector load with length (store). The length operand specifies > the highest bytes to load instead of the actual length. Hmm. It indeed cannot represent len == 0, so you are making sure that never happens? Because when it is actually zero you are going to get -1 here? At least I don't see the bias operand used at all: ; Implement len_load/len_store optabs with vll/vstl. (define_expand "len_load_v16qi" [(match_operand:V16QI 0 "register_operand") (match_operand:V16QI 1 "memory_operand") (match_operand:QI 2 "register_operand") (match_operand:QI 3 "vll_bias_operand") ] "TARGET_VX && TARGET_64BIT" { rtx mem = adjust_address (operands[1], BLKmode, 0); rtx len = gen_reg_rtx (SImode); emit_move_insn (len, gen_rtx_ZERO_EXTEND (SImode, operands[2])); emit_insn (gen_vllv16qi (operands[0], len, mem)); DONE; }) the docs of len_load say " @cindex @code{len_load_@var{m}} instruction pattern @item @samp{len_load_@var{m}} Load (operand 2 - operand 3) elements from vector memory operand 1 into vector register operand 0, setting the other elements of operand 0 to undefined values. Operands 0 and 1 have mode @var{m}, which must be a vector mode. Operand 2 has whichever integer mode the target prefers. Operand 3 conceptually has mode @code{QI}. Operand 2 can be a variable or a constant amount. Operand 3 specifies a constant bias: it is either a constant 0 or a constant -1. The predicate on operand 3 must only accept the bias values that the target actually supports. GCC handles a bias of 0 more efficiently than a bias of -1. If (operand 2 - operand 3) exceeds the number of elements in mode @var{m}, the behavior is undefined. If the target prefers the length to be measured in bytes rather than elements, it should only implement this pattern for vectors of @code{QI} elements." the minus in 'operand 2 - operand 3' should be a plus if the bias is really zero or -1. I suppose 'If (operand 2 - operand 3) exceeds the number of elements in mode @var{m}, the behavior is undefined.' means that the vectorizer has to make sure the biased element count never underflows? That is, for a loop like void foo (double *x, float *y, int n) { for (int i = 0; i < n; ++i) y[i] = x[i]; } you should get x1 = len_load (...); x2 = len_load (...); y = VEC_PACK_TRUNC_EXPR len_store (..., y); but then the x2 load can end up with a len of zero and thus trap (since you will load either a full vector or the first byte of it). I see you do /* If the backend requires a bias of -1 for LEN_LOAD, we must not emit len_loads with a length of zero. In order to avoid that we prohibit more than one loop length here. */ if (partial_load_bias == -1 && LOOP_VINFO_LENS (loop_vinfo).length () > 1) return false; that's quite conservative. I think you can do better when the loads are aligned, reading an extra byte when ignoring the bias is OK and you at least know the very first element is used. For stores you would need to emit compare&jump for all but the first store of a group though ... That said, I'm still not seeing where you actually apply the bias. Richard.