From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-47003-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 13810 invoked by alias); 25 Feb 2002 00:21:36 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 13729 invoked from network); 25 Feb 2002 00:21:32 -0000
Received: from unknown (HELO Nicole.fhm.edu) (213.7.87.14)
  by sources.redhat.com with SMTP; 25 Feb 2002 00:21:32 -0000
Received: from localhost.localdomain (unknown [10.23.201.7])
	by Nicole.fhm.edu (Postfix on SuSE Linux 7.2 (i386)) with ESMTP
	id AB1E9FA47; Mon, 25 Feb 2002 01:20:38 +0100 (CET)
Subject: Re: Altivec strangeness?
From: Daniel Egger <degger@fhm.edu>
To: Aldy Hernandez <aldyh@redhat.com>
Cc: GCC Developer Mailinglist <gcc@gcc.gnu.org>
In-Reply-To: <E65BBDE0-297D-11D6-97EE-000393750C1E@redhat.com>
References: <E65BBDE0-297D-11D6-97EE-000393750C1E@redhat.com>
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
X-Mailer: Evolution/1.0.2 
Date: Sun, 24 Feb 2002 16:41:00 -0000
Message-Id: <1014596591.3287.19.camel@sonja>
Mime-Version: 1.0
X-SW-Source: 2002-02/txt/msg01443.txt.bz2

Am Mon, 2002-02-25 um 00.26 schrieb Aldy Hernandez:

> > a) nasty because it requires a lot of typing.
> declare a macro:
> 	#define VSHORT_1S ((vector short int){1,1,1,1,1,1,1,1})

That's no much shorter than
const vector short shortones = (vector short int){1,1,1,1,1,1,1,1};
globally defined.
 
> as i have mentioned before, the vector initializers generate pretty
> bad code, but that will be remedied when, in 3.2, i rewrite them
> to use the vector constant infrastructure.  right now, they just
> get initialized as arrays, which is less than optimal.

Indeed.
 
> in the code's defense, how many times do you initialize a given
> vector in a function?  once!  it's not like it's going to drag
> performance down.

No, not in my case. I've small functions which have an generic
implementation but can be replaced by vectorised code. A profile
of a short run of the application will look like that:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 11.83      0.68     0.68    99864     6.81     9.81  synth_filter
 11.48      1.34     0.66  2105726     0.31     0.31  put_pixels_altivec
 11.48      2.00     0.66  1436416     0.46     0.46  j_rev_dct_altivec

For a function which is called a few million times per second runtime
it makes a lot of difference whether a constant vector is loaded
from memory whereby extra code is required to setup the base address
for the vector load or the vector simply get splatted into a vector
register which uses less memory, less opcodes and is likely happen
in the same amount of cpu cycles.

This is an example of assembly output produced by gcc 3.1:
	.align 2
        .globl put_pixels_clamped_altivec
        .type   put_pixels_clamped_altivec,@function
put_pixels_clamped_altivec:
        lis %r0,0x108
        lis %r9,zeros@ha
        ori %r0,%r0,16
        la %r9,zeros@l(%r9)
        dst %r3,%r0,0
        lvx %v13,0,%r9
        li %r0,8
        li %r11,0
        mtctr %r0
        li %r9,4
.L53:
        lvx %v0,0,%r3
        addi %r3,%r3,16
        vpkshus %v0,%v0,%v13
        vspltw %v1,%v0,1
        vspltw %v0,%v0,0
        stvewx %v0,%r11,%r4
        stvewx %v1,%r9,%r4
        add %r4,%r4,%r5
        bdnz .L53
        blr


As you can see it takes an additional lis, la to get the address
for the vector load. The inner loop is executed 8 times BTW.

> and if you have it in a loop, it's probably invariant, so move it out of it.

You bet on it. :)

> let's concentrate on getting the bugs ironed out of the current
> implementation, and then we can tackle code quality issues.

I hope you don't mind if I fool a bit around with code generation
now. :)
 
-- 
Servus,
       Daniel