From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x33.google.com (mail-oa1-x33.google.com [IPv6:2001:4860:4864:20::33]) by sourceware.org (Postfix) with ESMTPS id C5C07385840D for ; Mon, 30 Oct 2023 19:03:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C5C07385840D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C5C07385840D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:4860:4864:20::33 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698692596; cv=none; b=SgW/CAt2XEBfC25olac5jVFlejNYMZzlF5r977dKUpIp7VR72T0spyjZH0KKQJ/amIj68KhPOX2B1xY5HXJMs6QJ/L6LzxI2Qcn9xrr25LR04qsJ+xAg5cHRE/d9yEEJtrT2sobbO0gtrKKYWbkF6X/QH6imyJrklfit5abwR/Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698692596; c=relaxed/simple; bh=JHYfiOCWC0hqJL/4H3+omzTd6gjil32ol9QV7HClwjc=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=xIEL4U4Jc5PpU2xitRF+7TQI+p03Tkz9pJdh+fNgrW9G9jK7lULGGNIlKR9NJS+Q8NbdVQCH6ybjjoX7rHDrbM3TQztevyVtKsvsKK4cwZoFzkoY36lo/pKRzvBGP51qPMiZp86oeyC3PLbz8tdkFyYP7nOLCDdqi3JWfyeOUeA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-oa1-x33.google.com with SMTP id 586e51a60fabf-1e98e97c824so2979188fac.1 for ; Mon, 30 Oct 2023 12:03:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698692589; x=1699297389; darn=sourceware.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=d9Dry7KbZpDqsqEaPorgR6PMGQl/LBFWs1bLF6zrvsg=; b=XGUH7GniLL8ABMa42l/sQi9/sSzjv61E/8lDQ0ahurlEdjyGsMAQXRp52GwEAfKvHJ ZgmwkOIA54Gj9tG0pYeh7gIsA6sxu/B3GvmiXMTyvolLnGV9aeAs1qBfsUi47HvbyKEC OP/XQ8bU59gMYf2DTxStW49SS9/XSotc+3wV9aRHryRWlOcSJWgC/dxysRT5G8+OVzGK mDhSHfpDgfPAMJuoMgRUVC87dcylJ7JWRiPq4Akr34oIRGIjibb17Rz5zzL1eCge72yX Bm+Y6IPDIOU8SX+cvwnT/Yj2bS+MJjbIVoqgxw0+qO9xW+ni8oDx/Pk81QNpOWgogVZi W8nw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698692589; x=1699297389; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=d9Dry7KbZpDqsqEaPorgR6PMGQl/LBFWs1bLF6zrvsg=; b=xA1ix5/BP3HV72q/i+XrRMb9R6kEbORjyS6c8cybEEp/01PvGHqBfVWA+k1NopZ23r 5xRWLtY6PHWgeWdeXFcK54Wz/vsbS+Fj0Azh3f8TcZs0+I1Mv051D5pG+SJgjjnfyE4W uMlwwbZZilhw1imJC3/LcOYsoso7DnWYeCJsoGSa27weKlkVW4appun6Cg3vcgamnuaj TFxgn6u1UY31Ie3byvR/5KehNB+UCM3ClwYFZ/GkpRX+D/NzikDytN3M4S3Z0kq2spd4 uOylOiOZnQWjMkUoxs3xjhBIGost+zCvqZd4l07NuZ0kJJGCnFmBrDuNkKTWZINbjd3i TbMw== X-Gm-Message-State: AOJu0YyuqtUuyWpVQfYMVg20HSGEPyrCEBK1YE/Lh5qKJS+zFwcchSxM 9Z6JaLCbKeJnkCIsw+mKI1/M73LeXYAeoHxUyKc= X-Google-Smtp-Source: AGHT+IHAd9mnyTbAMz2g7v+Ke7X1iXrKC+99WkAlllmKLNJHkVBknl2/rNuQtCKPQ6JSOHTjrOii66110oBYTaLe5rU= X-Received: by 2002:a05:6870:114f:b0:1e9:9c24:4498 with SMTP id 15-20020a056870114f00b001e99c244498mr237687oag.9.1698692588948; Mon, 30 Oct 2023 12:03:08 -0700 (PDT) MIME-Version: 1.0 References: <20231003122251.3325435-1-adhemerval.zanella@linaro.org> <20231003122251.3325435-3-adhemerval.zanella@linaro.org> In-Reply-To: <20231003122251.3325435-3-adhemerval.zanella@linaro.org> From: Noah Goldstein Date: Mon, 30 Oct 2023 14:02:57 -0500 Message-ID: Subject: Re: [PATCH v8 2/7] stdlib: Optimization qsort{_r} swap implementation To: Adhemerval Zanella Cc: libc-alpha@sourceware.org, Paul Eggert , Florian Weimer Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Oct 3, 2023 at 7:23=E2=80=AFAM Adhemerval Zanella wrote: > > The optimization takes in consideration both the most common elements > are either 32 or 64 bit in size and inputs are aligned to the word > boundary. This is similar to what msort does. > > For large buffer the swap operation uses memcpy/mempcpy with a > small fixed size buffer (so compiler might inline the operations). > > Checked on x86_64-linux-gnu. > --- > stdlib/qsort.c | 95 ++++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 77 insertions(+), 18 deletions(-) > > diff --git a/stdlib/qsort.c b/stdlib/qsort.c > index 728a0ed370..072ccdfb95 100644 > --- a/stdlib/qsort.c > +++ b/stdlib/qsort.c > @@ -21,22 +21,73 @@ > > #include > #include > +#include > #include > #include > +#include > > -/* Byte-wise swap two items of size SIZE. */ > -#define SWAP(a, b, size) = \ > - do = \ > - { = \ > - size_t __size =3D (size); = \ > - char *__a =3D (a), *__b =3D (b); = \ > - do = \ > - { = \ > - char __tmp =3D *__a; = \ > - *__a++ =3D *__b; = \ > - *__b++ =3D __tmp; = \ > - } while (--__size > 0); = \ > - } while (0) > +/* Swap SIZE bytes between addresses A and B. These helpers are provide= d > + along the generic one as an optimization. */ > + > +enum swap_type_t > + { > + SWAP_WORDS_64, > + SWAP_WORDS_32, > + SWAP_BYTES > + }; > + > +/* If this function returns true, elements can be safely copied using wo= rd > + loads and stores. Otherwise, it might not be safe. BASE (as an inte= ger) > + must be a multiple of the word alignment. SIZE must be a multiple of > + WORDSIZE. Since WORDSIZE must be a multiple of the word alignment, a= nd > + WORDSIZE is a power of two on all supported platforms, this function = for > + speed merely checks that BASE and SIZE are both multiples of the word > + size. */ > +static inline bool > +is_aligned (const void *base, size_t size, size_t wordsize) > +{ > + return (((uintptr_t) base | size) & (wordsize - 1)) =3D=3D 0; > +} > + > +static inline void > +swap_words_64 (void * restrict a, void * restrict b, size_t n) > +{ > + typedef uint64_t __attribute__ ((__may_alias__)) u64_alias_t; > + do > + { > + n -=3D 8; > + u64_alias_t t =3D *(u64_alias_t *)(a + n); > + *(u64_alias_t *)(a + n) =3D *(u64_alias_t *)(b + n); > + *(u64_alias_t *)(b + n) =3D t; > + } while (n); > +} > + > +static inline void > +swap_words_32 (void * restrict a, void * restrict b, size_t n) > +{ > + typedef uint32_t __attribute__ ((__may_alias__)) u32_alias_t; > + do > + { > + n -=3D 4; > + u32_alias_t t =3D *(u32_alias_t *)(a + n); > + *(u32_alias_t *)(a + n) =3D *(u32_alias_t *)(b + n); > + *(u32_alias_t *)(b + n) =3D t; > + } while (n); > +} > + > +/* Replace the indirect call with a serie of if statements. It should h= elp > + the branch predictor. */ > +static void > +do_swap (void * restrict a, void * restrict b, size_t size, > + enum swap_type_t swap_type) > +{ > + if (swap_type =3D=3D SWAP_WORDS_64) > + swap_words_64 (a, b, size); > + else if (swap_type =3D=3D SWAP_WORDS_32) > + swap_words_32 (a, b, size); > + else > + __memswap (a, b, size); > +} > > /* Discontinue quicksort algorithm when partition gets below this size. > This particular magic number was chosen to work best on a Sun 4/260. = */ > @@ -96,6 +147,14 @@ _quicksort (void *const pbase, size_t total_elems, si= ze_t size, > /* Avoid lossage with unsigned arithmetic below. */ > return; > > + enum swap_type_t swap_type; > + if (is_aligned (pbase, size, 8)) > + swap_type =3D SWAP_WORDS_64; > + else if (is_aligned (pbase, size, 4)) > + swap_type =3D SWAP_WORDS_32; > + else > + swap_type =3D SWAP_BYTES; > + > if (total_elems > MAX_THRESH) > { > char *lo =3D base_ptr; > @@ -119,13 +178,13 @@ _quicksort (void *const pbase, size_t total_elems, = size_t size, > char *mid =3D lo + size * ((hi - lo) / size >> 1); > > if ((*cmp) ((void *) mid, (void *) lo, arg) < 0) > - SWAP (mid, lo, size); > + do_swap (mid, lo, size, swap_type); > if ((*cmp) ((void *) hi, (void *) mid, arg) < 0) > - SWAP (mid, hi, size); > + do_swap (mid, hi, size, swap_type); > else > goto jump_over; > if ((*cmp) ((void *) mid, (void *) lo, arg) < 0) > - SWAP (mid, lo, size); > + do_swap (mid, lo, size, swap_type); > jump_over:; > > left_ptr =3D lo + size; > @@ -144,7 +203,7 @@ _quicksort (void *const pbase, size_t total_elems, si= ze_t size, > > if (left_ptr < right_ptr) > { > - SWAP (left_ptr, right_ptr, size); > + do_swap (left_ptr, right_ptr, size, swap_type); > if (mid =3D=3D left_ptr) > mid =3D right_ptr; > else if (mid =3D=3D right_ptr) > @@ -216,7 +275,7 @@ _quicksort (void *const pbase, size_t total_elems, si= ze_t size, > tmp_ptr =3D run_ptr; > > if (tmp_ptr !=3D base_ptr) > - SWAP (tmp_ptr, base_ptr, size); > + do_swap (tmp_ptr, base_ptr, size, swap_type); > > /* Insertion sort, running from left-hand-side up to right-hand-side= . */ > > -- > 2.34.1 > LGTM Reviewed-by: Noah Goldstein