From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=PHkB=7V=linaro.org=adhemerval.zanella@sourceware.org>
Received: from mail-oo1-xc36.google.com (mail-oo1-xc36.google.com [IPv6:2607:f8b0:4864:20::c36])
	by sourceware.org (Postfix) with ESMTPS id D72D53858D1E
	for <libc-alpha@sourceware.org>; Wed, 29 Mar 2023 19:41:39 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D72D53858D1E
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org
Received: by mail-oo1-xc36.google.com with SMTP id o15-20020a4ae58f000000b00538c0ec9567so2626297oov.1
        for <libc-alpha@sourceware.org>; Wed, 29 Mar 2023 12:41:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google; t=1680118899;
        h=content-transfer-encoding:in-reply-to:organization:from:references
         :cc:to:content-language:subject:user-agent:mime-version:date
         :message-id:from:to:cc:subject:date:message-id:reply-to;
        bh=A81k6PxSJTYki2bNkCfDByz8kZ0EenZos2oSxmhDjZs=;
        b=Upsjl2VD/Seyv17rwobY3kiG3vXk9Zg1O9rIZRaFtbpmWwNilCUNmbV/ru2FaFBil0
         j1v9ihxwgHx/Nqk4M9Be5Z1yUdsoiX5UOHz26rMzkOhv+1gCCMbbATQCJ728Ywo/Wqmf
         d1Cs5a2DXW7AoahxPzT+tqrCrJRq1accWBAXVpTMkHSy0sVuf7I/czo11MmuYIubLdrP
         rhqdbRSlydzBQkePhY3iKK8pJQIPQfabkorI3TFcOMXOCZFpWAPiV/F27n0Nhdu4dySO
         Qllz2y3OxvilZ8yajrL4tRISgt3BR1mlyM4UtKXFdSklO8RQD+tsrpS/sshANiwE0E1v
         KpKg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112; t=1680118899;
        h=content-transfer-encoding:in-reply-to:organization:from:references
         :cc:to:content-language:subject:user-agent:mime-version:date
         :message-id:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=A81k6PxSJTYki2bNkCfDByz8kZ0EenZos2oSxmhDjZs=;
        b=ZY6xxaLnwnpyg+M3YEGQRzYypqjffCQMfta2Yycg2FJFgUnMT9dhmPsnp292SQFlgB
         +A/JmF+39VEnNe80zzgjL15yT/qbnDKG00UwSrNrwQzf2BRwDKgTm782EbMrHE+djEnF
         KUN8EznuQ3RmFuxYkr81AMT/9yJt/qf9Opm/xe31x3XI39w96/CINufj/blre7WCXq0a
         AGJ/R9tfaqviwjZ3cv1qaq5FKCiSJtGYhPN2L7dJRGrTVmVtcbGDnG/3xSj9zEPeH8CV
         kxMK9cjfVQlt3y58WQZBrdWBtKzj7OUqARjquf3hv4YvKrkV+lf5MTJOfh64FTB8ha3k
         MuZA==
X-Gm-Message-State: AO0yUKVYwO4JKGF5Hs7742n43SeS7ufIsAQpwIEV6oKBWVpEQIHBBgrz
	PX3974GJELxzuwUmWYCzyrvtIHbgG3AgOcMZtTCI4A==
X-Google-Smtp-Source: AK7set+YDPwt5yYx3QIv8LUUF35+/6v/6ZKS7RYhyD0TlI1qYSXPiwX4pBTfINqq7Lmx1iAcEeIzcA==
X-Received: by 2002:a4a:45c7:0:b0:538:fc3b:f66c with SMTP id y190-20020a4a45c7000000b00538fc3bf66cmr11587668ooa.1.1680118898934;
        Wed, 29 Mar 2023 12:41:38 -0700 (PDT)
Received: from ?IPV6:2804:1b3:a7c1:60f9:1426:1d2d:d6b:1761? ([2804:1b3:a7c1:60f9:1426:1d2d:d6b:1761])
        by smtp.gmail.com with ESMTPSA id b186-20020aca34c3000000b00389295e8424sm3838383oia.45.2023.03.29.12.41.37
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Wed, 29 Mar 2023 12:41:38 -0700 (PDT)
Message-ID: <610c24cb-bdf6-d31c-fb77-e7d6b2403e02@linaro.org>
Date: Wed, 29 Mar 2023 16:41:36 -0300
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
 Gecko/20100101 Thunderbird/102.9.0
Subject: Re: [PATCH v5 1/1] memalign: Support scanning for aligned chunks.
Content-Language: en-US
To: DJ Delorie <dj@redhat.com>
Cc: libc-alpha@sourceware.org
References: <xnr0t8xk7e.fsf@greed.delorie.com>
From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Organization: Linaro
In-Reply-To: <xnr0t8xk7e.fsf@greed.delorie.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <libc-alpha.sourceware.org>


On 29/03/23 01:20, DJ Delorie wrote:
> From e32abda27e5c0aa82f4b736fdca35d56bf665cce Mon Sep 17 00:00:00 2001
> From: DJ Delorie via Libc-alpha <libc-alpha@sourceware.org>
> Date: Wed, 29 Mar 2023 00:18:40 -0400
> Subject: memalign: Support scanning for aligned chunks.
> 
> This patch adds a chunk scanning algorithm to the _int_memalign code
> path that reduces heap fragmentation by reusing already aligned chunks
> instead of always looking for chunks of larger sizes and splitting
> them.  The tcache macros are extended to allow removing a chunk from
> the middle of the list.
> 
> The goal is to fix the pathological use cases where heaps grow
> continuously in workloads that are heavy users of memalign.
> 
> Note that tst-memalign-2 checks for tcache operation, which
> malloc-check bypasses.

LGTM, thanks.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> 
> diff --git a/malloc/Makefile b/malloc/Makefile
> index dfb51d344c..79178c4905 100644
> --- a/malloc/Makefile
> +++ b/malloc/Makefile
> @@ -43,6 +43,7 @@ tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \
>  	 tst-tcfree1 tst-tcfree2 tst-tcfree3 \
>  	 tst-safe-linking \
>  	 tst-mallocalign1 \
> +	 tst-memalign-2
>  
>  tests-static := \
>  	 tst-interpose-static-nothread \
> @@ -72,7 +73,7 @@ test-srcs = tst-mtrace
>  # with MALLOC_CHECK_=3 because they expect a specific failure.
>  tests-exclude-malloc-check = tst-malloc-check tst-malloc-usable \
>  	tst-mxfast tst-safe-linking \
> -	tst-compathooks-off tst-compathooks-on
> +	tst-compathooks-off tst-compathooks-on tst-memalign-2
>  
>  # Run all tests with MALLOC_CHECK_=3
>  tests-malloc-check = $(filter-out $(tests-exclude-malloc-check) \
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index 76c50e3f58..8ebc4372bc 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -3162,19 +3162,44 @@ tcache_put (mchunkptr chunk, size_t tc_idx)
>  }
>  
>  /* Caller must ensure that we know tc_idx is valid and there's
> -   available chunks to remove.  */
> +   available chunks to remove.  Removes chunk from the middle of the
> +   list.  */
>  static __always_inline void *
> -tcache_get (size_t tc_idx)
> +tcache_get_n (size_t tc_idx, tcache_entry **ep)
>  {
> -  tcache_entry *e = tcache->entries[tc_idx];
> +  tcache_entry *e;
> +  if (ep == &(tcache->entries[tc_idx]))
> +    e = *ep;
> +  else
> +    e = REVEAL_PTR (*ep);
> +
>    if (__glibc_unlikely (!aligned_OK (e)))
>      malloc_printerr ("malloc(): unaligned tcache chunk detected");
> -  tcache->entries[tc_idx] = REVEAL_PTR (e->next);
> +
> +  if (ep == &(tcache->entries[tc_idx]))
> +      *ep = REVEAL_PTR (e->next);
> +  else
> +    *ep = PROTECT_PTR (ep, REVEAL_PTR (e->next));
> +
>    --(tcache->counts[tc_idx]);
>    e->key = 0;
>    return (void *) e;
>  }
>  
> +/* Like the above, but removes from the head of the list.  */
> +static __always_inline void *
> +tcache_get (size_t tc_idx)
> +{
> +  return tcache_get_n (tc_idx, & tcache->entries[tc_idx]);
> +}
> +
> +/* Iterates through the tcache linked list.  */
> +static __always_inline tcache_entry *
> +tcache_next (tcache_entry *e)
> +{
> +  return (tcache_entry *) REVEAL_PTR (e->next);
> +}
> +
>  static void
>  tcache_thread_shutdown (void)
>  {
> @@ -3283,7 +3308,7 @@ __libc_malloc (size_t bytes)
>  
>    DIAG_PUSH_NEEDS_COMMENT;
>    if (tc_idx < mp_.tcache_bins
> -      && tcache
> +      && tcache != NULL
>        && tcache->counts[tc_idx] > 0)
>      {
>        victim = tcache_get (tc_idx);
> @@ -3542,6 +3567,38 @@ _mid_memalign (size_t alignment, size_t bytes, void *address)
>        alignment = a;
>      }
>  
> +#if USE_TCACHE
> +  {
> +    size_t tbytes;
> +    tbytes = checked_request2size (bytes);
> +    if (tbytes == 0)
> +      {
> +	__set_errno (ENOMEM);
> +	return NULL;
> +      }
> +    size_t tc_idx = csize2tidx (tbytes);
> +
> +    if (tc_idx < mp_.tcache_bins
> +	&& tcache != NULL
> +	&& tcache->counts[tc_idx] > 0)
> +      {
> +	/* The tcache itself isn't encoded, but the chain is.  */
> +	tcache_entry **tep = & tcache->entries[tc_idx];
> +	tcache_entry *te = *tep;
> +	while (te != NULL && !PTR_IS_ALIGNED (te, alignment))
> +	  {
> +	    tep = & (te->next);
> +	    te = tcache_next (te);
> +	  }
> +	if (te != NULL)
> +	  {
> +	    void *victim = tcache_get_n (tc_idx, tep);
> +	    return tag_new_usable (victim);
> +	  }
> +      }
> +  }
> +#endif
> +
>    if (SINGLE_THREAD_P)
>      {
>        p = _int_memalign (&main_arena, alignment, bytes);
> @@ -3847,7 +3904,7 @@ _int_malloc (mstate av, size_t bytes)
>  	      /* While we're here, if we see other chunks of the same size,
>  		 stash them in the tcache.  */
>  	      size_t tc_idx = csize2tidx (nb);
> -	      if (tcache && tc_idx < mp_.tcache_bins)
> +	      if (tcache != NULL && tc_idx < mp_.tcache_bins)
>  		{
>  		  mchunkptr tc_victim;
>  
> @@ -3905,7 +3962,7 @@ _int_malloc (mstate av, size_t bytes)
>  	  /* While we're here, if we see other chunks of the same size,
>  	     stash them in the tcache.  */
>  	  size_t tc_idx = csize2tidx (nb);
> -	  if (tcache && tc_idx < mp_.tcache_bins)
> +	  if (tcache != NULL && tc_idx < mp_.tcache_bins)
>  	    {
>  	      mchunkptr tc_victim;
>  
> @@ -3967,7 +4024,7 @@ _int_malloc (mstate av, size_t bytes)
>  #if USE_TCACHE
>    INTERNAL_SIZE_T tcache_nb = 0;
>    size_t tc_idx = csize2tidx (nb);
> -  if (tcache && tc_idx < mp_.tcache_bins)
> +  if (tcache != NULL && tc_idx < mp_.tcache_bins)
>      tcache_nb = nb;
>    int return_cached = 0;
>  
> @@ -4047,7 +4104,7 @@ _int_malloc (mstate av, size_t bytes)
>  #if USE_TCACHE
>  	      /* Fill cache first, return to user only if cache fills.
>  		 We may return one of these chunks later.  */
> -	      if (tcache_nb
> +	      if (tcache_nb > 0
>  		  && tcache->counts[tc_idx] < mp_.tcache_count)
>  		{
>  		  tcache_put (victim, tc_idx);
> @@ -4921,6 +4978,43 @@ _int_realloc (mstate av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
>     ------------------------------ memalign ------------------------------
>   */
>  
> +/* Returns 0 if the chunk is not and does not contain the requested
> +   aligned sub-chunk, else returns the amount of "waste" from
> +   trimming.  BYTES is the *user* byte size, not the chunk byte
> +   size.  */
> +static size_t
> +chunk_ok_for_memalign (mchunkptr p, size_t alignment, size_t bytes)
> +{
> +  void *m = chunk2mem (p);
> +  INTERNAL_SIZE_T size = memsize (p);
> +  void *aligned_m = m;
> +
> +  if (__glibc_unlikely (misaligned_chunk (p)))
> +    malloc_printerr ("_int_memalign(): unaligned chunk detected");
> +
> +  aligned_m = PTR_ALIGN_UP (m, alignment);
> +
> +  INTERNAL_SIZE_T front_extra = (intptr_t) aligned_m - (intptr_t) m;
> +
> +  /* We can't trim off the front as it's too small.  */
> +  if (front_extra > 0 && front_extra < MINSIZE)
> +    return 0;
> +
> +  /* If it's a perfect fit, it's an exception to the return value rule
> +     (we would return zero waste, which looks like "not usable"), so
> +     handle it here by returning a small non-zero value instead.  */
> +  if (size == bytes && front_extra == 0)
> +    return 1;
> +
> +  /* If the block we need fits in the chunk, calculate total waste.  */
> +  if (size > bytes + front_extra)
> +    return size - bytes;
> +
> +  /* Can't use this chunk.  */ 
> +  return 0;
> +}
> +
> +/* BYTES is user requested bytes, not requested chunksize bytes.  */
>  static void *
>  _int_memalign (mstate av, size_t alignment, size_t bytes)
>  {
> @@ -4934,8 +5028,7 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
>    mchunkptr remainder;            /* spare room at end to split off */
>    unsigned long remainder_size;   /* its size */
>    INTERNAL_SIZE_T size;
> -
> -
> +  mchunkptr victim;
>  
>    nb = checked_request2size (bytes);
>    if (nb == 0)
> @@ -4944,29 +5037,142 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
>        return NULL;
>      }
>  
> -  /*
> -     Strategy: find a spot within that chunk that meets the alignment
> +  /* We can't check tcache here because we hold the arena lock, which
> +     tcache doesn't expect.  We expect it has been checked
> +     earlier.  */
> +
> +  /* Strategy: search the bins looking for an existing block that
> +     meets our needs.  We scan a range of bins from "exact size" to
> +     "just under 2x", spanning the small/large barrier if needed.  If
> +     we don't find anything in those bins, the common malloc code will
> +     scan starting at 2x.  */
> +
> +  /* This will be set if we found a candidate chunk.  */
> +  victim = NULL;
> +
> +  /* Fast bins are singly-linked, hard to remove a chunk from the middle
> +     and unlikely to meet our alignment requirements.  We have not done
> +     any experimentation with searching for aligned fastbins.  */
> +
> +  int first_bin_index;
> +  int first_largebin_index;
> +  int last_bin_index;
> +
> +  if (in_smallbin_range (nb))
> +    first_bin_index = smallbin_index (nb);
> +  else
> +    first_bin_index = largebin_index (nb);
> +
> +  if (in_smallbin_range (nb * 2))
> +    last_bin_index = smallbin_index (nb * 2);
> +  else
> +    last_bin_index = largebin_index (nb * 2);
> +
> +  first_largebin_index = largebin_index (MIN_LARGE_SIZE);
> +
> +  int victim_index;                 /* its bin index */
> +
> +  for (victim_index = first_bin_index;
> +       victim_index < last_bin_index;
> +       victim_index ++)
> +    {
> +      victim = NULL;
> +
> +      if (victim_index < first_largebin_index)
> +    {
> +      /* Check small bins.  Small bin chunks are doubly-linked despite
> +	 being the same size.  */
> +
> +      mchunkptr fwd;                    /* misc temp for linking */
> +      mchunkptr bck;                    /* misc temp for linking */
> +
> +      bck = bin_at (av, victim_index);
> +      fwd = bck->fd;
> +      while (fwd != bck)
> +	{
> +	  if (chunk_ok_for_memalign (fwd, alignment, bytes) > 0)
> +	    {
> +	      victim = fwd;
> +
> +	      /* Unlink it */
> +	      victim->fd->bk = victim->bk;
> +	      victim->bk->fd = victim->fd;
> +	      break;
> +	    }
> +
> +	  fwd = fwd->fd;
> +	}
> +    }
> +  else
> +    {
> +      /* Check large bins.  */
> +      mchunkptr fwd;                    /* misc temp for linking */
> +      mchunkptr bck;                    /* misc temp for linking */
> +      mchunkptr best = NULL;
> +      size_t best_size = 0;
> +
> +      bck = bin_at (av, victim_index);
> +      fwd = bck->fd;
> +
> +      while (fwd != bck)
> +	{
> +	  int extra;
> +
> +	  if (chunksize (fwd) < nb)
> +	      break;
> +	  extra = chunk_ok_for_memalign (fwd, alignment, bytes);
> +	  if (extra > 0
> +	      && (extra <= best_size || best == NULL))
> +	    {
> +	      best = fwd;
> +	      best_size = extra;
> +	    }
> +
> +	  fwd = fwd->fd;
> +	}
> +      victim = best;
> +
> +      if (victim != NULL)
> +	{
> +	  unlink_chunk (av, victim);
> +	  break;
> +	}
> +    }
> +
> +      if (victim != NULL)
> +	break;
> +    }
> +
> +  /* Strategy: find a spot within that chunk that meets the alignment
>       request, and then possibly free the leading and trailing space.
> -   */
> +     This strategy is incredibly costly and can lead to external
> +     fragmentation if header and footer chunks are unused.  */
>  
> -  /* Call malloc with worst case padding to hit alignment. */
> +  if (victim != NULL)
> +    {
> +      p = victim;
> +      m = chunk2mem (p);
> +      set_inuse (p);
> +    }
> +  else
> +    {
> +      /* Call malloc with worst case padding to hit alignment. */
>  
> -  m = (char *) (_int_malloc (av, nb + alignment + MINSIZE));
> +      m = (char *) (_int_malloc (av, nb + alignment + MINSIZE));
>  
> -  if (m == 0)
> -    return 0;           /* propagate failure */
> +      if (m == 0)
> +	return 0;           /* propagate failure */
>  
> -  p = mem2chunk (m);
> +      p = mem2chunk (m);
> +    }
>  
>    if ((((unsigned long) (m)) % alignment) != 0)   /* misaligned */
> -
> -    { /*
> -                Find an aligned spot inside chunk.  Since we need to give back
> -                leading space in a chunk of at least MINSIZE, if the first
> -                calculation places us at a spot with less than MINSIZE leader,
> -                we can move to the next aligned spot -- we've allocated enough
> -                total room so that this is always possible.
> -                 */
> +    {
> +      /* Find an aligned spot inside chunk.  Since we need to give back
> +         leading space in a chunk of at least MINSIZE, if the first
> +         calculation places us at a spot with less than MINSIZE leader,
> +         we can move to the next aligned spot -- we've allocated enough
> +         total room so that this is always possible.  */
>        brk = (char *) mem2chunk (((unsigned long) (m + alignment - 1)) &
>                                  - ((signed long) alignment));
>        if ((unsigned long) (brk - (char *) (p)) < MINSIZE)
> diff --git a/malloc/tst-memalign-2.c b/malloc/tst-memalign-2.c
> new file mode 100644
> index 0000000000..4996578e9f
> --- /dev/null
> +++ b/malloc/tst-memalign-2.c
> @@ -0,0 +1,155 @@
> +/* Test for memalign chunk reuse.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <errno.h>
> +#include <malloc.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <array_length.h>
> +#include <libc-pointer-arith.h>
> +#include <support/check.h>
> +
> +typedef struct TestCase {
> +  size_t size;
> +  size_t alignment;
> +  void *ptr1;
> +  void *ptr2;
> +} TestCase;
> +
> +static TestCase tcache_allocs[] = {
> +  { 24, 8, NULL, NULL },
> +  { 24, 16, NULL, NULL },
> +  { 128, 32, NULL, NULL }
> +};
> +#define TN array_length (tcache_allocs)
> +
> +static TestCase large_allocs[] = {
> +  { 23450, 64, NULL, NULL },
> +  { 23450, 64, NULL, NULL },
> +  { 23550, 64, NULL, NULL },
> +  { 23550, 64, NULL, NULL },
> +  { 23650, 64, NULL, NULL },
> +  { 23650, 64, NULL, NULL },
> +  { 33650, 64, NULL, NULL },
> +  { 33650, 64, NULL, NULL }
> +};
> +#define LN array_length (large_allocs)
> +
> +void *p;
> +
> +/* Sanity checks, ancillary to the actual test.  */
> +#define CHECK(p,a) \
> +  if (p == NULL || !PTR_IS_ALIGNED (p, a)) \
> +    FAIL_EXIT1 ("NULL or misaligned memory detected.\n");
> +
> +static int
> +do_test (void)
> +{
> +  int i, j;
> +  int count;
> +  void *ptr[10];
> +  void *p;
> +
> +  /* TCache test.  */
> +
> +  for (i = 0; i < TN; ++ i)
> +    {
> +      tcache_allocs[i].ptr1 = memalign (tcache_allocs[i].alignment, tcache_allocs[i].size);
> +      CHECK (tcache_allocs[i].ptr1, tcache_allocs[i].alignment);
> +      free (tcache_allocs[i].ptr1);
> +      /* This should return the same chunk as was just free'd.  */
> +      tcache_allocs[i].ptr2 = memalign (tcache_allocs[i].alignment, tcache_allocs[i].size);
> +      CHECK (tcache_allocs[i].ptr2, tcache_allocs[i].alignment);
> +      free (tcache_allocs[i].ptr2);
> +
> +      TEST_VERIFY (tcache_allocs[i].ptr1 == tcache_allocs[i].ptr2);
> +    }
> +
> +  /* Test for non-head tcache hits.  */
> +  for (i = 0; i < array_length (ptr); ++ i)
> +    {
> +      if (i == 4)
> +	{
> +	  ptr[i] = memalign (64, 256);
> +	  CHECK (ptr[i], 64);
> +	}
> +      else
> +	{
> +	  ptr[i] = malloc (256);
> +	  CHECK (ptr[i], 4);
> +	}
> +    }
> +  for (i = 0; i < array_length (ptr); ++ i)
> +    free (ptr[i]);
> +
> +  p = memalign (64, 256);
> +  CHECK (p, 64);
> +
> +  count = 0;
> +  for (i = 0; i < 10; ++ i)
> +    if (ptr[i] == p)
> +      ++ count;
> +  free (p);
> +  TEST_VERIFY (count > 0);
> +
> +  /* Large bins test.  */
> +
> +  for (i = 0; i < LN; ++ i)
> +    {
> +      large_allocs[i].ptr1 = memalign (large_allocs[i].alignment, large_allocs[i].size);
> +      CHECK (large_allocs[i].ptr1, large_allocs[i].alignment);
> +      /* Keep chunks from combining by fragmenting the heap.  */
> +      p = malloc (512);
> +      CHECK (p, 4);
> +    }
> +
> +  for (i = 0; i < LN; ++ i)
> +    free (large_allocs[i].ptr1);
> +
> +  /* Force the unsorted bins to be scanned and moved to small/large
> +     bins.  */
> +  p = malloc (60000);
> +
> +  for (i = 0; i < LN; ++ i)
> +    {
> +      large_allocs[i].ptr2 = memalign (large_allocs[i].alignment, large_allocs[i].size);
> +      CHECK (large_allocs[i].ptr2, large_allocs[i].alignment);
> +    }
> +
> +  count = 0;
> +  for (i = 0; i < LN; ++ i)
> +    {
> +      int ok = 0;
> +      for (j = 0; j < LN; ++ j)
> +	if (large_allocs[i].ptr1 == large_allocs[j].ptr2)
> +	  ok = 1;
> +      if (ok == 1)
> +	count ++;
> +    }
> +
> +  /* The allocation algorithm is complicated outside of the memalign
> +     logic, so just make sure it's working for most of the
> +     allocations.  This avoids possible boundary conditions with
> +     empty/full heaps.  */
> +  TEST_VERIFY (count > LN / 2);
> +
> +  return 0;
> +}
> +
> +#include <support/test-driver.c>
>