From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from rock.gnat.com (rock.gnat.com [205.232.38.15]) by sourceware.org (Postfix) with ESMTP id 46B9239960F7 for ; Wed, 3 Feb 2021 15:11:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 46B9239960F7 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=adacore.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=oliva@adacore.com Received: from localhost (localhost.localdomain [127.0.0.1]) by filtered-rock.gnat.com (Postfix) with ESMTP id 12163117286; Wed, 3 Feb 2021 10:11:47 -0500 (EST) X-Virus-Scanned: Debian amavisd-new at gnat.com Received: from rock.gnat.com ([127.0.0.1]) by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id b9hQmaLJaTm0; Wed, 3 Feb 2021 10:11:47 -0500 (EST) Received: from free.home (tron.gnat.com [IPv6:2620:20:4000:0:46a8:42ff:fe0e:e294]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by rock.gnat.com (Postfix) with ESMTPS id C9194117280; Wed, 3 Feb 2021 10:11:46 -0500 (EST) Received: from livre (livre.home [172.31.160.2]) by free.home (8.15.2/8.15.2) with ESMTPS id 113FBdWE061500 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 3 Feb 2021 12:11:39 -0300 From: Alexandre Oliva To: Richard Biener Cc: Jan Hubicka , GCC Patches , Zdenek Dvorak Subject: Re: [RFC] test builtin ratio for loop distribution Organization: Free thinker, does not speak for AdaCore References: Errors-To: aoliva@lxoliva.fsfla.org Date: Wed, 03 Feb 2021 12:11:39 -0300 In-Reply-To: (Richard Biener's message of "Wed, 3 Feb 2021 09:59:20 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.84 X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Feb 2021 15:11:49 -0000 On Feb 3, 2021, Richard Biener wrote: > So I think we should try to match what __builtin_memcpy/memset > expansion would do here, taking advantage of extra alignment > and size knowledge. In particular, > a) if __builtin_memcpy/memset would use setmem/cpymem optabs > see if we can have variants of memcpy/memset transferring alignment > and size knowledge We could add more optional parameters to them to convey the length's known ctz. However, the ctz can't be recovered reliably. We can't even recover it after gimplifying the length within ldist! That said, my other patch already enables ctz calls to recover it, at least in libgcc risc-v tfmode cases, and it's possible it's readily available in other cases. I'd rather leave that for someone dealing with the machine-specific patterns to figure out whether a separate argument would be useful. RISC-V, which is what I'm dealing with, doesn't have much to offer as far as these patterns are concerned. > b) if expansion would use BY_PIECES then expand to an unrolled loop Why would that be better than keeping the constant-length memset call, that would be turned into an unrolled loop during expand? > c) if expansion would emit a memset/memcpy call but we know > alignment and have a low bound on niters emit a loop (like your patch does) > a) might be difficult but adding the builtin variants may pay off in any case. *nod* > The patch itself could benefit from one or two helpers we already > have, first of all there's create_empty_loop_on_edge (so you don't > need the loop fixup) Uhh, thanks, but... you realize nearly all of the gimple-building code is one and the same for the loop and for trailing count misalignment? There doesn't seem to be a lot of benefit to using this primitive, aside from its dealing with creating the loop data structure which, again, I'd only want to do in the loop case. (I guess I should add more comments as to the inline expansion strategy. it's equivalent to: i = len, ptr = base, blksz = 1 << alctz; while (i >= blksz) { *(ub*)ptr = val; i -= blksz; ptr += blksz; } blksz >>= 1; if (i >= blksz) { *(ub*)ptr = val; i -= blksz; ptr += blksz; } blksz >>= 1; if (i >= blksz) { *(ub*)ptr = val; i -= blksz; ptr += blksz; } ... until blksz gets down to zero or to 1< Note that for memmove if we know the dependence direction, we > can also emit a loop / unrolled code. *nod*, but the logic would have to be quite different, using bit tests, and odds are we won't know the direction and have to output a test and code for both possibilities, which would be quite unlikely to be beneficial. Though the original code would quite likely make the direction visible; perhaps if the size is small enough that we would expand a memcpy inline, we should refrain from transforming the loop into a memmove call. In the case at hand, there's no benefit at all to these transformations: we start from a loop with the known alignment and a small loop count (0 to 2 words copied), and if everything goes all right with the transformation, we may be lucky to get back to that. It's not like the transformation could even increase the known alignment, so why bother, and throw away debug info by rewriting the loop into the same code minus debug? > I think the builtins with alignment and calloc-style element count > will be useful on its own. Oh, I see, you're suggesting actual separate builtin functions. Uhh... I'm not sure I want to go there. I'd much rather recover the ctz of the length, and use it in existing code. I'd also prefer if the generic memset (and memcpy and memmove?) builtin expanders dealt with non-constant lengths in the way I implemented. That feels like the right spot for it. That deprives us of gimple loop optimizations in the inlined loop generated by the current patch, but if we expand an unrolled loop with compares and offsets with small constants, loop optimizations might not even be relevant. FWIW, the patch I posted yesterday is broken, the regstrap test did not even build libstdc++-v3 successfully. I'm not sure whether to pursue it further, or to reimplement it in the expander. Suggestions are welcome. -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar