From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenth@sourceware.org>
Received: by sourceware.org (Postfix, from userid 1666)
	id 3BDA03861814; Thu, 15 Feb 2024 08:14:54 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3BDA03861814
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1707984894;
	bh=0rfd8UHsQ9XGQQvanJ7FVNtiU4bDjuZIizdzjr4jVI4=;
	h=From:To:Subject:Date:From;
	b=IROk2QvJBdm05oCjzoqYgJs4/XLuWX4RvUUP3rqX9+vEhUXtfv3hJnKK7nsJpGLPB
	 f9+sSp3aAzd+pCwr+FV7dcjaijU3uuYU6ffb1qxVbcex3TXYFW/X6sgq4Jqi96V/sd
	 cGWlddJ45REGOcMPp7fdKyGWUhxYX4Y8NL2h5IwE=
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="utf-8"
From: Richard Biener <rguenth@gcc.gnu.org>
To: gcc-cvs@gcc.gnu.org
Subject: [gcc r14-8994] [libiberty] remove TBAA violation in iterative_hash,
 improve code-gen
X-Act-Checkin: gcc
X-Git-Author: Richard Biener <rguenther@suse.de>
X-Git-Refname: refs/heads/master
X-Git-Oldrev: 5266f930bed06c99a9845bbde7d90cb285037733
X-Git-Newrev: 52ac4c6be8664e9dab5b90e7c64df03985791893
Message-Id: <20240215081454.3BDA03861814@sourceware.org>
Date: Thu, 15 Feb 2024 08:14:54 +0000 (GMT)
List-Id: <gcc-cvs.sourceware.org>

https://gcc.gnu.org/g:52ac4c6be8664e9dab5b90e7c64df03985791893

commit r14-8994-g52ac4c6be8664e9dab5b90e7c64df03985791893
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Feb 14 14:00:23 2024 +0100

    [libiberty] remove TBAA violation in iterative_hash, improve code-gen
    
    The following removes the TBAA violation present in iterative_hash.
    As we eventually LTO that it's important to fix.  This also improves
    code generation for the >= 12 bytes loop by using | to compose the
    4 byte words as at least GCC 7 and up can recognize that pattern
    and perform a 4 byte load while the variant with a + is not
    recognized (not on trunk either), I think we have an enhancement bug
    for this somewhere.
    
    Given we reliably merge and the bogus "optimized" path might be
    only relevant for archs that cannot do misaligned loads efficiently
    I've chosen to keep a specialization for aligned accesses.
    
    libiberty/
            * hashtab.c (iterative_hash): Remove TBAA violating handling
            of aligned little-endian case in favor of just keeping the
            aligned case special-cased.  Use | for composing a larger word.

Diff:
---
 libiberty/hashtab.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/libiberty/hashtab.c b/libiberty/hashtab.c
index 48f28078114e..e3a07256a300 100644
--- a/libiberty/hashtab.c
+++ b/libiberty/hashtab.c
@@ -940,26 +940,23 @@ iterative_hash (const void *k_in /* the key */,
   c = initval;           /* the previous hash value */
 
   /*---------------------------------------- handle most of the key */
-#ifndef WORDS_BIGENDIAN
-  /* On a little-endian machine, if the data is 4-byte aligned we can hash
-     by word for better speed.  This gives nondeterministic results on
-     big-endian machines.  */
-  if (sizeof (hashval_t) == 4 && (((size_t)k)&3) == 0)
-    while (len >= 12)    /* aligned */
+  /* Provide specialization for the aligned case for targets that cannot
+     efficiently perform misaligned loads of a merged access.  */
+  if ((((size_t)k)&3) == 0)
+    while (len >= 12)
       {
-	a += *(hashval_t *)(k+0);
-	b += *(hashval_t *)(k+4);
-	c += *(hashval_t *)(k+8);
+	a += (k[0] | ((hashval_t)k[1]<<8) | ((hashval_t)k[2]<<16) | ((hashval_t)k[3]<<24));
+	b += (k[4] | ((hashval_t)k[5]<<8) | ((hashval_t)k[6]<<16) | ((hashval_t)k[7]<<24));
+	c += (k[8] | ((hashval_t)k[9]<<8) | ((hashval_t)k[10]<<16)| ((hashval_t)k[11]<<24));
 	mix(a,b,c);
 	k += 12; len -= 12;
       }
   else /* unaligned */
-#endif
     while (len >= 12)
       {
-	a += (k[0] +((hashval_t)k[1]<<8) +((hashval_t)k[2]<<16) +((hashval_t)k[3]<<24));
-	b += (k[4] +((hashval_t)k[5]<<8) +((hashval_t)k[6]<<16) +((hashval_t)k[7]<<24));
-	c += (k[8] +((hashval_t)k[9]<<8) +((hashval_t)k[10]<<16)+((hashval_t)k[11]<<24));
+	a += (k[0] | ((hashval_t)k[1]<<8) | ((hashval_t)k[2]<<16) | ((hashval_t)k[3]<<24));
+	b += (k[4] | ((hashval_t)k[5]<<8) | ((hashval_t)k[6]<<16) | ((hashval_t)k[7]<<24));
+	c += (k[8] | ((hashval_t)k[9]<<8) | ((hashval_t)k[10]<<16)| ((hashval_t)k[11]<<24));
 	mix(a,b,c);
 	k += 12; len -= 12;
       }