From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-ports-return-4435-listarch-libc-ports=sources.redhat.com@sourceware.org>
Received: (qmail 24070 invoked by alias); 4 Sep 2013 11:43:07 -0000
Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-ports.sourceware.org>
List-Subscribe: <mailto:libc-ports-subscribe@sourceware.org>
List-Post: <mailto:libc-ports@sourceware.org>
List-Help: <mailto:libc-ports-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: libc-ports-owner@sourceware.org
Received: (qmail 24057 invoked by uid 89); 4 Sep 2013 11:43:06 -0000
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 04 Sep 2013 11:43:06 +0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-4.2 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD autolearn=ham version=3.3.2
X-HELO: mx1.redhat.com
Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11])	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r84Bgmma019877	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);	Wed, 4 Sep 2013 07:42:48 -0400
Received: from spoyarek.pnq.redhat.com (dhcp193-137.pnq.redhat.com [10.65.193.137])	by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id r84BgfNq014229	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);	Wed, 4 Sep 2013 07:42:45 -0400
Date: Wed, 04 Sep 2013 11:43:00 -0000
From: Siddhesh Poyarekar <siddhesh@redhat.com>
To: =?utf-8?B?T25kxZllaiBCw61sa2E=?= <neleai@seznam.cz>
Cc: "Carlos O'Donell" <carlos@redhat.com>,        Will Newton <will.newton@linaro.org>,        "libc-ports@sourceware.org" <libc-ports@sourceware.org>,        Patch Tracking <patches@linaro.org>
Subject: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
Message-ID: <20130904114529.GC4306@spoyarek.pnq.redhat.com>
References: <CANu=DmiXLL9v1Z1KS0sBOs-pL8csEUGc9YE829_-tidKd-GruQ@mail.gmail.com> <5220F1F0.80501@redhat.com> <CANu=DmhA9QvSe6RS72Db2P=yyjC72fsE8d4QZKHEcNiwqxNMvw@mail.gmail.com> <52260BD0.6090805@redhat.com> <20130903173710.GA2028@domone.kolej.mff.cuni.cz> <522621E2.6020903@redhat.com> <20130903185721.GA3876@domone.kolej.mff.cuni.cz> <5226354D.8000006@redhat.com> <20130904073008.GA4306@spoyarek.pnq.redhat.com> <20130904110333.GA6216@domone.kolej.mff.cuni.cz>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20130904110333.GA6216@domone.kolej.mff.cuni.cz>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-SW-Source: 2013-09/txt/msg00035.txt.bz2

On Wed, Sep 04, 2013 at 01:03:33PM +0200, OndÅej BÃ­lka wrote:
> > 1. Assume aligned input.  Nothing should take (any noticeable)
> >    performance away from align copies/moves
> Not very useful as this is extremely dependant on function measured. For
> functions like strcmp and strlen alignments are mostly random so aligned
> case does not say much. On opposite end of spectrum is memset which is
> almost always 8 byte aligned and unaligned performance does not make lot
> of sense.

Agreed.  So for functions like memset/memcpy/memmove we heavily favour
aligned inputs.  For strlen/strchr/memchr we strive for acceptable
average case performance, i.e. less variance in performance.

> > 2. Scale with size
> Not very important for several reasons. One is that big sizes are cold
> (just look in oprofile output that loops are less frequent than header.)
> 
> Second reason is that if we look at caller large sizes are unlikely
> bottleneck.

I did not imply that we optimize for larger sizes - I meant that as a
general principle, the algorithm should scale reasonably for larger
sizes.  A quadratic algorithm is bad even if it gives acceptable
performance for smaller sizes.  I would consider that a pretty
important trait to monitor in the benchmark even if we won't really
get such implementations in practice.

> > 3. Provide acceptable performance for unaligned sizes without
> >    penalizing the aligned case
> 
> This is quite important case. It should be measured correctly, what is
> important is that alignment varies. This can be slower than when you
> pick fixed alignment and alignment varies in reality.

I agree that we need to measure unaligned cases correctly.

> > 4. Measure the effect of dcache pressure on function performance
> > 5. Measure effect of icache pressure on function performance.
> > 
> Here you really need to base weigths on function usage patterns. 
> A bigger code size is acceptable for functions that are called more
> often. You need to see distribution of how are calls clustered to get
> full picture. A strcmp is least sensitive to icache concerns, as when it
> is called its mostly 100 times over in tight loop so size is not big issue.
> If same number of call is uniformnly spread through program we need
> stricter criteria.

That's not necessarily true.  It may be true for specific applications
but I don't think an strcmp is always called in a tight loop.  Do you
have a qualitative argument to prove that statement or is it just
based on dry runs?

Siddhesh