From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1525 invoked by alias); 4 Apr 2003 13:18:00 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 1518 invoked from network); 4 Apr 2003 13:17:58 -0000 Received: from unknown (HELO smtp-out.comcast.net) (24.153.64.115) by sources.redhat.com with SMTP; 4 Apr 2003 13:17:58 -0000 Received: from master.atkinson.dhs.org (pcp219109pcs.elkrdg01.md.comcast.net [68.55.220.142]) by mtaout09.icomcast.net (iPlanet Messaging Server 5.2 HotFix 1.14 (built Mar 18 2003)) with ESMTP id <0HCT003G8M6BOF@mtaout09.icomcast.net> for gcc@gcc.gnu.org; Fri, 04 Apr 2003 08:17:58 -0500 (EST) Received: from kevin-pc.atkinson.dhs.org (kevin-pc.atkinson.dhs.org [192.168.1.3]) by master.atkinson.dhs.org (Postfix) with ESMTP id BD5BEB84C; Fri, 04 Apr 2003 08:15:46 -0500 (EST) Date: Fri, 04 Apr 2003 14:51:00 -0000 From: Kevin Atkinson Subject: Re: Slow memcmp for aligned strings on Pentium 3 In-reply-to: X-X-Sender: kevina@kevin-pc.atkinson.dhs.org To: Roger Sayle Cc: gcc@gcc.gnu.org Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Content-transfer-encoding: 7BIT X-SW-Source: 2003-04/txt/msg00177.txt.bz2 On Thu, 3 Apr 2003, Roger Sayle wrote: > > Hi Kevin, > > I did some tests and discovered that using cmps was rather slow, > > compared to a simple loop and then a bswap and subtract at the end. > > I'm sure that GCC's memcmp implementations could be improved, but > from reading the code examples in your patch it looks like you > are always assuming that either the length is a multiple of four, > or that the bytes following the memory sections to be compared > contain identical values (i.e. you're hoping they're all zero). > > i.e., if p and q are suitably 4-byte aligned > > memset(p,"abcd",4); > memset(q,"abef",4); > memcmp(p,q,2) > > should compare equal but don't using bswaps and subtractions. > Similarly, when two words mismatch their return value <0 or > >0 should depend upon the first byte that differs, not the > values of the bytes that come after it. Your right. It can be fixed by a test if it is a multiple of 4 and if not do a byte wise comparison at the end. > I suspect it should be possible to fix your code to handle these > termination conditions correctly, and a comparison of your > routine's performance with these fixes vs. __builtin_memcmp > would be of interest. I will see what I can do. --- http://kevin.atkinson.dhs.org