From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailout04.t-online.de (mailout04.t-online.de [194.25.134.18]) by sourceware.org (Postfix) with ESMTPS id 297733858D20 for ; Tue, 15 Mar 2022 02:17:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 297733858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=t-online.de Authentication-Results: sourceware.org; spf=none smtp.mailfrom=t-online.de Received: from fwd80.dcpf.telekom.de (fwd80.aul.t-online.de [10.223.144.106]) by mailout04.t-online.de (Postfix) with SMTP id 92385504A; Tue, 15 Mar 2022 03:17:10 +0100 (CET) Received: from localhost.localdomain ([115.165.108.210]) by fwd80.t-online.de with (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384 encrypted) esmtp id 1nTwkF-1l5jyS0; Tue, 15 Mar 2022 03:17:09 +0100 Message-ID: <585c1ed8179b8de0765454a46cc0b2eea26b37d0.camel@t-online.de> Subject: Re: RFA: crc builtin functions & optimizations From: Oleg Endo To: Andrew Pinski , Joern Rennecke Cc: GCC Patches Date: Tue, 15 Mar 2022 11:17:05 +0900 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.40.4 (3.40.4-3.fc34) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-TOI-EXPURGATEID: 150726::1647310629-00001C4D-E0CC77D0/0/0 CLEAN NORMAL X-TOI-MSGID: c40a3451-8872-48c1-8ea2-ad136bbc9bc8 X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00, FREEMAIL_FROM, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Mar 2022 02:17:13 -0000 On Mon, 2022-03-14 at 18:04 -0700, Andrew Pinski via Gcc-patches wrote: > On Mon, Mar 14, 2022 at 5:33 PM Joern Rennecke > wrote: > > > > Most microprocessors have efficient ways to perform CRC operations, be > > that with lookup tables, rotates, or even special instructions. > > However, because we lack a representation for CRC in the compiler, we > > can't do proper instruction selection. With this patch I seek out to > > rectify this, > > I've avoided using a mode name for the built-in functions because that > > would tie the semantics to the size of the addressable unit. We > > generally use abbreviations like s/l/ll for type names, which is all > > right when the type can be widened without changing semantics. For > > the data input, however, we also have to consider the shift count that > > is tied to it. That is why I used a number to designate the width of > > the data input and shift. > > > > For machine support, I made a start with 8 and 16 bit little-endian > > CRC for RISCV using a > > lookup table. I am sure once we have the basic infrastructure in the > > tree, we'll get more > > contributions of suitable named patterns for various ports. > > > A few points. > There are at least 9 different polynomials for the CRC-8 in common use today. > For CRC-32 there are 5 different polynomials used. > You don't have a patch to invoke.texi adding the descriptions of the builtins. > How is your polynom 3rd argument described? Is it similar to how it is > done on the wiki for the CRC? > Does it make sense to have to list the most common polynomials in the > documentation? > > Also I am sorry but micro-optimizing coremarks is just wrong. Maybe it > is better to pick the CRC32 that is inside zip instead for a testcase > and benchmarking against? > Or even the CRC32C for iSCSI/ext4. > > I see you also don't optimize the case where you have three other > variants of polynomials that are reversed, reciprocal and reversed > reciocal. In my own CRC library I've got ~30 'commonly used' CRC types, based on the following generic definition: template < // number of crc result bits (polynomial order in bits) unsigned int BitCount, // normal polynomial without the leading 1 bit. typename crc_impl_detail::select_int::type TruncPoly, // initial remainder typename crc_impl_detail::select_int::type InitRem = 0, // final xor value typename crc_impl_detail::select_int::type FinalXor = 0, // input data byte reflected before processing (LSB / MSB first) bool ReflectInput = false, // output CRC reflected before the xor bool ReflectRemainder = false > class crc { ... }; and then it goes like ... // CRC-1 (most hardware; also known as parity bit) // x + 1 typedef crc < 1, 0x01 > crc_1; // CRC-3 typedef crc < 3, 0x03, 0x07, 0x00, true, true> crc_3; ... // CRC-32 (ISO 3309, ANSI X3.66, FIPS PUB 71, FED-STD-1003, ITU-T V.42, Ethernet, SATA, MPEG-2, Gzip, PKZIP, POSIX cksum, PNG, ZMODEM) // x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1 typedef crc < 32, 0x04C11DB7, 0xFFFFFFFF, 0xFFFFFFFF, true, true > crc_32; typedef crc < 32, 0x04C11DB7, 0x7FFFFFFF, 0x7FFFFFFF, false, false > crc_32_mpeg2; typedef crc < 32, 0x04C11db7, 0x00000000, 0xFFFFFFFF, false, false > crc_32_posix; ... It then generates the lookup tables at compile time into .rodata for the types that are used in the program, which is great for MCUs with more flash/ROM than RAM. Specific CRC types can be overridden if the system has a better way to calculate the CRC, e.g. as hardware peripheral. This being a library makes it relatively easy to tune and customize for various systems. How would that work together with your proposal? Cheers, Oleg