From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 37A373858415 for ; Tue, 16 Aug 2022 11:08:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 37A373858415 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1660648129; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=7Ivg11WjhJHT/cZmnK7Z4iNoKFCaoqsTnu01gqR84s4=; b=iCo+h+rPaT7IASHakpRjv4rg4ZmcV5FkITdMCn2os+AAy14iKP1cxyH+R13P9WJTGV0rMZ ObaXZK2StkOY1q+1uODQpiMAY1QQOFQgRc/NwpGl1Gv7ju2IbWXRXITu/Low1Y4S2l/2z/ kn2d42NmcqGsU/Q3oRgijYBAon2D+gI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-558-oQQUJlczNBCYLIYlknHIbg-1; Tue, 16 Aug 2022 07:08:48 -0400 X-MC-Unique: oQQUJlczNBCYLIYlknHIbg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 86AA01C04B54; Tue, 16 Aug 2022 11:08:48 +0000 (UTC) Received: from calimero.vinschen.de (unknown [10.39.193.176]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4606C2026D4C; Tue, 16 Aug 2022 11:08:48 +0000 (UTC) Received: by calimero.vinschen.de (Postfix, from userid 500) id 191A3A80741; Tue, 16 Aug 2022 13:08:47 +0200 (CEST) Date: Tue, 16 Aug 2022 13:08:47 +0200 From: Corinna Vinschen To: Seija Kijin Cc: newlib@sourceware.org Subject: Re: [PATCH] newlib: libc: Optimize the string functions Message-ID: Reply-To: newlib@sourceware.org Mail-Followup-To: Seija Kijin , newlib@sourceware.org References: MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: newlib@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Newlib mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2022 11:08:51 -0000 Hi Seija, Thanks for the patch. First of all, your patch is broken in terms of whitespaces and line breaks, so it can't be applied. I suspect your MUA is doing that automatically. You better provide the git patch as attachement, or change your MUA settings to leave whitespaces and line breaks alone. On Aug 12 17:11, Seija Kijin wrote: > --- a/newlib/libc/string/memccpy.c > +++ b/newlib/libc/string/memccpy.c > @@ -34,16 +34,17 @@ PORTABILITY > /* Nonzero if either X or Y is not aligned on a "long" boundary. */ > #define UNALIGNED(X, Y) \ > - (((long)X & (sizeof (long) - 1)) | ((long)Y & (sizeof (long) - 1))) > + (((unsigned long)X | (unsigned long)Y) & (sizeof (unsigned long) - 1)) > /* How many bytes are copied each iteration of the word copy loop. */ > -#define LITTLEBLOCKSIZE (sizeof (long)) > +#define LITTLEBLOCKSIZE (sizeof (unsigned long)) > /* Threshhold for punting to the byte copier. */ > #define TOO_SMALL(LEN) ((LEN) < LITTLEBLOCKSIZE) > /* Macros for detecting endchar */ > #if LONG_MAX == 2147483647L > +#define MAGIC Why? There's no MAGIC in this code. > #define DETECTNULL(X) (((X) - 0x01010101) & ~(X) & 0x80808080) > #else > #if LONG_MAX == 9223372036854775807L > @@ -64,11 +65,11 @@ memccpy (void *__restrict dst0, > #if defined(PREFER_SIZE_OVER_SPEED) || defined(__OPTIMIZE_SIZE__) > void *ptr = NULL; > - char *dst = (char *) dst0; > - char *src = (char *) src0; > - char endchar = endchar0 & 0xff; > + unsigned char *dst = (unsigned char *) dst0; > + const unsigned char *src = (const unsigned char *) src0; > + const unsigned char endchar = (unsigned char) endchar0; > - while (len0--) > + for (; len0; len0--) What exactly are you winning here? How do you know that splitting a compare-postdecrement into separate compare and decrement is faster than the original operation? - Not decrementing immediately after compare only "optimizes" the first iteration of the loop. Any further iteration needs to decrement and compare anyway. So the supposed advantage only affects the degenerated case of len0 == 0 and is insignificant in any other case. - The compiler can optimize this by its own, with different optimizations leading to one of the expressions being faster than the other depending on compiler version and/or target CPU. - One CPU could have optimized microcode for compare-postdecrement which you may break up into less optimized code if the compiler's optimizer isn't capable of combining the ops again. - Alternatively the target CPU has an optimized predecrement-compare op and no optimized compare-postdecrement op. But this is generic code. You don't know if you're speeding up or slowing down, or not affecting at all. Given that, I'd leave this loops alone. It's the job of the compiler to optimize them. The changes only raise the size of the patch unnecessarily. Same goes for changes like this: while ([...]) { - if (!length--) + if (!length) [...] + length--; } > void * > memmem (const void *haystack, size_t hs_len, const void *needle, size_t ne_len) > { > - const char *hs = haystack; > - const char *ne = needle; > + const unsigned char *hs = haystack; > + const unsigned char *ne = needle; > if (ne_len == 0) > return (void *)hs; > - int i; > - int c = ne[0]; > - const char *end = hs + hs_len - ne_len; > + size_t i; ^^^^^^^^^ This is actually a bugfix. Thanks for catching it, but it very much deserves a patch of its own, together with the other changes from int to size_t in memmem and strstr. > @@ -143,7 +143,7 @@ memmem (const void *haystack, size_t hs_len, const > void *needle, size_t ne_len) > size_t tmp, shift1; > size_t m1 = ne_len - 1; > size_t offset = 0; > - int i; > + size_t i; ^^^^^^^^^ ...this one. Sorry, but I've given up after about half of the patch. I don't see why this is only a single huge patch. If there's any problem, multiple smaller patches are easier to understand, and easier to revert or bisect. Please resend this patch as a patchset, preferredly one file per patch including a matching commit message per patch. The aforementioned bugfix should be a separate patch in the patchset. Thanks, Corinna