From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 6D1113858D28 for ; Wed, 12 Apr 2023 12:24:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6D1113858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 495F6D75; Wed, 12 Apr 2023 05:25:25 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E78643F73F; Wed, 12 Apr 2023 05:24:39 -0700 (PDT) From: Richard Sandiford To: "juzhe.zhong\@rivai.ai" Mail-Followup-To: "juzhe.zhong\@rivai.ai" ,rguenther , gcc-patches , jeffreyalaw , rdapp , linkw , richard.sandiford@arm.com Cc: rguenther , gcc-patches , jeffreyalaw , rdapp , linkw Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization References: <20230407014741.139387-1-juzhe.zhong@rivai.ai> <63723855B0BF2130+2023041120125573846623@rivai.ai> <139DA38AFC9CA5B5+2023041216004591287739@rivai.ai> Date: Wed, 12 Apr 2023 13:24:38 +0100 In-Reply-To: (juzhe's message of "Wed, 12 Apr 2023 19:37:19 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-25.1 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: "juzhe.zhong@rivai.ai" writes: >>> I think that already works for them (could be misremembering). >>> However, IIUC, they have no special instruction to calculate the >>> length (unlike for RVV), and so it's open-coded using vect_get_len. > > Yeah, the current flow using min, sub, and then min in vect_get_len > is working for IBM. But I wonder whether switching the current flow of > length-loop-control into the WHILE_LEN pattern that this patch can improve > their performance. > >>> (1) How easy would it be to express WHILE_LEN in normal gimple? >>> I haven't thought about this at all, so the answer might be >>> "very hard". But it reminds me a little of UQDEC on AArch64, >>> which we open-code using MAX_EXPR and MINUS_EXPR (see > >> vect_set_loop_controls_directly). > > >> I'm not saying WHILE_LEN is the same operation, just that it seems > >> like it might be open-codeable in a similar way. > > >> Even if we can open-code it, we'd still need some way for the > >> target to select the "RVV way" from the "s390/PowerPC way". > > WHILE_LEN in doc I define is > operand0 = MIN (operand1, operand2)operand1 is the residual number of scalar elements need to be updated.operand2 is vectorization factor (vf) for single rgroup. if multiple rgroup operan2 = vf * nitems_per_ctrl.You mean such pattern is not well expressed so we need to replace it with normaltree code (MIN OR MAX). And let RISC-V backend to optimize them into vsetvl ?Sorry, maybe I am not on the same page. It's not so much that we need to do that. But normally it's only worth adding internal functions if they do something that is too complicated to express in simple gimple arithmetic. The UQDEC case I mentioned: z = MAX (x, y) - y fell into the "simple arithmetic" category for me. We could have added an ifn for unsigned saturating decrement, but it didn't seem complicated enough to merit its own ifn. >>> (2) What effect does using a variable IV step (the result of >>> the WHILE_LEN) have on ivopts? I remember experimenting with >>> something similar once (can't remember the context) and not >>> having a constant step prevented ivopts from making good >>> addresing-mode choices. > > Thank you so much for pointing out this. Currently, varialble IV step and decreasing n down to 0 > works fine for RISC-V downstream GCC and we didn't find issues related addressing-mode choosing. OK, that's good. Sounds like it isn't a problem then. > I think I must missed something, would you mind giving me some hints so that I can study on ivopts > to find out which case may generate inferior codegens for varialble IV step? I think AArch64 was sensitive to this because (a) the vectoriser creates separate IVs for each base address and (b) for SVE, we instead want invariant base addresses that are indexed by the loop control IV. Like Richard says, if the loop control IV isn't a SCEV, ivopts isn't able to use it and so (b) fails. Thanks, Richard