From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <vinschen@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.133.124])
 by sourceware.org (Postfix) with ESMTPS id 37A373858415
 for <newlib@sourceware.org>; Tue, 16 Aug 2022 11:08:50 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 37A373858415
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
 s=mimecast20190719; t=1660648129;
 h=from:from:reply-to:reply-to:subject:subject:date:date:
 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
 content-type:content-type:in-reply-to:in-reply-to:  references:references;
 bh=7Ivg11WjhJHT/cZmnK7Z4iNoKFCaoqsTnu01gqR84s4=;
 b=iCo+h+rPaT7IASHakpRjv4rg4ZmcV5FkITdMCn2os+AAy14iKP1cxyH+R13P9WJTGV0rMZ
 ObaXZK2StkOY1q+1uODQpiMAY1QQOFQgRc/NwpGl1Gv7ju2IbWXRXITu/Low1Y4S2l/2z/
 kn2d42NmcqGsU/Q3oRgijYBAon2D+gI=
Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com
 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-558-oQQUJlczNBCYLIYlknHIbg-1; Tue, 16 Aug 2022 07:08:48 -0400
X-MC-Unique: oQQUJlczNBCYLIYlknHIbg-1
Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com
 [10.11.54.4])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 86AA01C04B54;
 Tue, 16 Aug 2022 11:08:48 +0000 (UTC)
Received: from calimero.vinschen.de (unknown [10.39.193.176])
 by smtp.corp.redhat.com (Postfix) with ESMTPS id 4606C2026D4C;
 Tue, 16 Aug 2022 11:08:48 +0000 (UTC)
Received: by calimero.vinschen.de (Postfix, from userid 500)
 id 191A3A80741; Tue, 16 Aug 2022 13:08:47 +0200 (CEST)
Date: Tue, 16 Aug 2022 13:08:47 +0200
From: Corinna Vinschen <vinschen@redhat.com>
To: Seija Kijin <doremylover456@gmail.com>
Cc: newlib@sourceware.org
Subject: Re: [PATCH] newlib: libc: Optimize the string functions
Message-ID: <Yvt6v/ioprcZfKEc@calimero.vinschen.de>
Reply-To: newlib@sourceware.org
Mail-Followup-To: Seija Kijin <doremylover456@gmail.com>, newlib@sourceware.org
References: <CAOfV4GcjyQe5Lc1JZ-yovoAScFkrmU1_Vb5OvyT15XJW2juZpw@mail.gmail.com>
MIME-Version: 1.0
In-Reply-To: <CAOfV4GcjyQe5Lc1JZ-yovoAScFkrmU1_Vb5OvyT15XJW2juZpw@mail.gmail.com>
X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: newlib@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Newlib mailing list <newlib.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/newlib>,
 <mailto:newlib-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/newlib/>
List-Post: <mailto:newlib@sourceware.org>
List-Help: <mailto:newlib-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/newlib>,
 <mailto:newlib-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2022 11:08:51 -0000

Hi Seija,

Thanks for the patch.

First of all, your patch is broken in terms of whitespaces and line
breaks, so it can't be applied.  I suspect your MUA is doing that
automatically.  You better provide the git patch as attachement, or
change your MUA settings to leave whitespaces and line breaks alone.

On Aug 12 17:11, Seija Kijin wrote:
> --- a/newlib/libc/string/memccpy.c
> +++ b/newlib/libc/string/memccpy.c
> @@ -34,16 +34,17 @@ PORTABILITY
> /* Nonzero if either X or Y is not aligned on a "long" boundary. */
> #define UNALIGNED(X, Y) \
> - (((long)X & (sizeof (long) - 1)) | ((long)Y & (sizeof (long) - 1)))
> + (((unsigned long)X | (unsigned long)Y) & (sizeof (unsigned long) - 1))
> /* How many bytes are copied each iteration of the word copy loop. */
> -#define LITTLEBLOCKSIZE (sizeof (long))
> +#define LITTLEBLOCKSIZE (sizeof (unsigned long))
> /* Threshhold for punting to the byte copier. */
> #define TOO_SMALL(LEN) ((LEN) < LITTLEBLOCKSIZE)
> /* Macros for detecting endchar */
> #if LONG_MAX == 2147483647L
> +#define MAGIC

Why? There's no MAGIC in this code.

> #define DETECTNULL(X) (((X) - 0x01010101) & ~(X) & 0x80808080)
> #else
> #if LONG_MAX == 9223372036854775807L
> @@ -64,11 +65,11 @@ memccpy (void *__restrict dst0,
> #if defined(PREFER_SIZE_OVER_SPEED) || defined(__OPTIMIZE_SIZE__)
> void *ptr = NULL;
> - char *dst = (char *) dst0;
> - char *src = (char *) src0;
> - char endchar = endchar0 & 0xff;
> + unsigned char *dst = (unsigned char *) dst0;
> + const unsigned char *src = (const unsigned char *) src0;
> + const unsigned char endchar = (unsigned char) endchar0;
> - while (len0--)
> + for (; len0; len0--)

What exactly are you winning here?  How do you know that splitting a
compare-postdecrement into separate compare and decrement is faster than
the original operation?

- Not decrementing immediately after compare only "optimizes" the first
  iteration of the loop.  Any further iteration needs to decrement and
  compare anyway.  So the supposed advantage only affects the degenerated
  case of len0 == 0 and is insignificant in any other case.

- The compiler can optimize this by its own, with different optimizations
  leading to one of the expressions being faster than the other depending
  on compiler version and/or target CPU.

- One CPU could have optimized microcode for compare-postdecrement which
  you may break up into less optimized code if the compiler's optimizer
  isn't capable of combining the ops again.

- Alternatively the target CPU has an optimized predecrement-compare op
  and no optimized compare-postdecrement op.  But this is generic code.
  You don't know if you're speeding up or slowing down, or not affecting
  at all.

Given that, I'd leave this loops alone.  It's the job of the compiler to
optimize them.  The changes only raise the size of the patch unnecessarily.

Same goes for changes like this:

  while ([...])
    {
-     if (!length--)
+     if (!length)
	[...]
+     length--;
    }

> void *
> memmem (const void *haystack, size_t hs_len, const void *needle, size_t ne_len)
> {
> - const char *hs = haystack;
> - const char *ne = needle;
> + const unsigned char *hs = haystack;
> + const unsigned char *ne = needle;
> if (ne_len == 0)
> return (void *)hs;
> - int i;
> - int c = ne[0];
> - const char *end = hs + hs_len - ne_len;
> + size_t i;
    ^^^^^^^^^
    This is actually a bugfix.  Thanks for catching it, but it
    very much deserves a patch of its own, together with the
    other changes from int to size_t in memmem and strstr.

> @@ -143,7 +143,7 @@ memmem (const void *haystack, size_t hs_len, const
> void *needle, size_t ne_len)
> size_t tmp, shift1;
> size_t m1 = ne_len - 1;
> size_t offset = 0;
> - int i;
> + size_t i;
    ^^^^^^^^^
 ...this one.

Sorry, but I've given up after about half of the patch.  I don't see why
this is only a single huge patch.  If there's any problem, multiple
smaller patches are easier to understand, and easier to revert or
bisect.

Please resend this patch as a patchset, preferredly one file per patch
including a matching commit message per patch.  The aforementioned
bugfix should be a separate patch in the patchset.


Thanks,
Corinna