* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
@ 2012-01-24 15:31 ` rguenth at gcc dot gnu.org
2012-01-27 14:50 ` eric.batut at allegorithmic dot com
` (12 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-01-24 15:31 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2012-01-24
Ever Confirmed|0 |1
--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-01-24 14:57:43 UTC ---
It looks like the neon builtins are not properly marked as pure/const, that
certainly is a road-block for optimizations. The heavy use of UNSPECs is
another.
Confirmed.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
2012-01-24 15:31 ` [Bug target/51980] " rguenth at gcc dot gnu.org
@ 2012-01-27 14:50 ` eric.batut at allegorithmic dot com
2012-01-27 15:51 ` ramana at gcc dot gnu.org
` (11 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: eric.batut at allegorithmic dot com @ 2012-01-27 14:50 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
Eric Batut <eric.batut at allegorithmic dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ramana at gcc dot gnu.org,
| |rsandifo at gcc dot gnu.org
--- Comment #2 from Eric Batut <eric.batut at allegorithmic dot com> 2012-01-27 14:13:08 UTC ---
Adding the usual suspects for ARM-related bugs.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
2012-01-24 15:31 ` [Bug target/51980] " rguenth at gcc dot gnu.org
2012-01-27 14:50 ` eric.batut at allegorithmic dot com
@ 2012-01-27 15:51 ` ramana at gcc dot gnu.org
2012-03-30 8:18 ` ramana at gcc dot gnu.org
` (10 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: ramana at gcc dot gnu.org @ 2012-01-27 15:51 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Target|arm-linux-androideabi |arm-linux-androideabi,
| |arm*-*-*eabi
--- Comment #3 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2012-01-27 15:25:29 UTC ---
(In reply to comment #1)
> It looks like the neon builtins are not properly marked as pure/const, that
> certainly is a road-block for optimizations.
> The heavy use of UNSPECs is
> another.
yes, one other problem is that a lot of the neon intrinsics don't expand into
an equivalent RTL - you still need the unspecs for the polynomial types but in
general a large number of the intrinsics that are in the form of unspecs could
use the underlying vec_ expanders that are also present.
>
> Confirmed.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (2 preceding siblings ...)
2012-01-27 15:51 ` ramana at gcc dot gnu.org
@ 2012-03-30 8:18 ` ramana at gcc dot gnu.org
2012-03-30 8:40 ` ramana at gcc dot gnu.org
` (9 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: ramana at gcc dot gnu.org @ 2012-03-30 8:18 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
--- Comment #4 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2012-03-30 07:58:49 UTC ---
Your testcase is broken - it doesn't honour reinterpret_casts properly . This
is a better testcase.
#include <arm_neon.h>
uint32x4_t sqrlen4D_16u8( const uint8x16_t A, const uint8x16_t B )
{
const uint8x16_t absAB = vabdq_u8( A, B );
const uint16x8_t square_l = vmull_u8( vget_low_u8( absAB ), vget_low_u8( absAB
) );
const uint16x8_t square_h = vmull_u8( vget_high_u8( absAB ), vget_high_u8(
absAB ) );
const uint32x4x2_t rgrgrgrg_babababa = vuzpq_u32( vreinterpretq_u32_u16
(square_l), vreinterpretq_u32_u16 (square_h) );
const uint16x8_t rgrgrgrg = vreinterpretq_u16_u32 (rgrgrgrg_babababa.val[0]);
const uint16x8_t babababa = vreinterpretq_u16_u32 (rgrgrgrg_babababa.val[1]);
const uint32x4_t rpg_rpg_rpg_rpg = vpaddlq_u16( rgrgrgrg );
const uint32x4_t dp = vpadalq_u16( rpg_rpg_rpg_rpg, babababa );
return ( dp );
}
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (3 preceding siblings ...)
2012-03-30 8:18 ` ramana at gcc dot gnu.org
@ 2012-03-30 8:40 ` ramana at gcc dot gnu.org
2012-07-05 16:46 ` ramana at gcc dot gnu.org
` (8 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: ramana at gcc dot gnu.org @ 2012-03-30 8:40 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
--- Comment #5 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2012-03-30 08:17:21 UTC ---
Experimenting with :
Applying the patch of PR48941 and the patch for lower-subreg here
http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01886.html
I now see : We still have too many moves for my liking but the gratuituous
spilling is now gone.
.cpu cortex-a9
.eabi_attribute 27, 3
.fpu neon
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 2
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "t2.c"
.text
.align 2
.global sqrlen4D_16u8
.type sqrlen4D_16u8, %function
sqrlen4D_16u8:
@ args = 16, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vmov d16, r0, r1 @ v16qi
vmov d17, r2, r3
vldmia sp, {d18-d19}
vabd.u8 q10, q8, q9
vmull.u8 q11, d20, d20
vmull.u8 q10, d21, d21
vmov q8, q11 @ v4si -- unnecessary ?
vmov q9, q10 @ v4si -- unnecessary ?
vuzp.32 q8, q9
vpaddl.u16 q10, q8
vmov q11, q10 @ v4si -- unnecessary
vpadal.u16 q11, q9
vmov r0, r1, d22 @ v4si
vmov r2, r3, d23
bx lr
.size sqrlen4D_16u8, .-sqrlen4D_16u8
.ident "GCC: (GNU) 4.8.0 20120330 (experimental)"
.section .note.GNU-stack,"",%progbits
This probably makes it a dup of PR48941 but it's starting to look more
promising now.
Eric, could you try the 2 patches and see what you get - This isn't something
to be gratuitously backported as we still have to see the effects elsewhere but
it would be worth seeing if this helps on your intrinsics testcases.
Ramana
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (4 preceding siblings ...)
2012-03-30 8:40 ` ramana at gcc dot gnu.org
@ 2012-07-05 16:46 ` ramana at gcc dot gnu.org
2013-05-28 19:30 ` mgretton at gcc dot gnu.org
` (7 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: ramana at gcc dot gnu.org @ 2012-07-05 16:46 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
--- Comment #6 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2012-07-05 16:45:32 UTC ---
Author: ramana
Date: Thu Jul 5 16:45:18 2012
New Revision: 189294
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189294
Log:
2012-07-05 Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org>
PR target/49891
PR target/51980
* gcc/testsuite/gcc.target/arm/neon/vtrnf32.c: Update.
* gcc/testsuite/gcc.target/arm/neon/vtrns32.c: Update.
* gcc/testsuite/gcc.target/arm/neon/vtrnu32.c: Update.
* gcc/testsuite/gcc.target/arm/neon/vzipf32.c: Update.
* gcc/testsuite/gcc.target/arm/neon/vzips32.c: Update.
* gcc/testsuite/gcc.target/arm/neon/vzipu32.c: Update.
2012-07-05 Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org>
Julian Brown <julian@codesourcery.com>
PR target/49891
PR target/51980
* config/arm/neon-gen.ml (return_by_ptr): Delete.
(print_function): Handle empty strings.
(return): Delete use of return_by_ptr.
(mask_shape_for_shuffle): New function.
(mask_elems): Likewise.
(shuffle_fn): Likewise.
(params): Simplify and remove use of return_by_ptr.
(get_shuffle): New function.
(print_variant): Update.
* config/arm/neon.ml (rev_elems): New function.
(permute_range): Likewise.
(zip_range): Likewise.
(uzip_range): Likewise.
(trn_range): Likewise.
(zip_elems): Likewise.
(uzip_elems): Likewise.
(trn_elems): Likewise.
(features): New enumeration Use_shuffle. Delete ReturnPtr.
(pf_su_8_16): New.
(suf_32): New.
(ops): Update entries for Vrev64, Vrev32, Vrev16, Vtr, Vzip, Vuzp.
* config/arm/arm_neon.h: Regenerate.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/arm/arm_neon.h
trunk/gcc/config/arm/neon-gen.ml
trunk/gcc/config/arm/neon.ml
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/arm/neon/vtrnf32.c
trunk/gcc/testsuite/gcc.target/arm/neon/vtrns32.c
trunk/gcc/testsuite/gcc.target/arm/neon/vtrnu32.c
trunk/gcc/testsuite/gcc.target/arm/neon/vzipf32.c
trunk/gcc/testsuite/gcc.target/arm/neon/vzips32.c
trunk/gcc/testsuite/gcc.target/arm/neon/vzipu32.c
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (5 preceding siblings ...)
2012-07-05 16:46 ` ramana at gcc dot gnu.org
@ 2013-05-28 19:30 ` mgretton at gcc dot gnu.org
2014-01-22 12:19 ` ktkachov at gcc dot gnu.org
` (6 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: mgretton at gcc dot gnu.org @ 2013-05-28 19:30 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
mgretton at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mgretton at gcc dot gnu.org
--- Comment #7 from mgretton at gcc dot gnu.org ---
Testing the testcase in #4 with a recent trunk (gcc version 4.9.0 20130528
(experimental) (GCC)) gives the following results:
arm-none-eabi-gcc -march=armv7-a -mfpu=neon -mfloat-abi=softfp -O2 -mthumb:
sqrlen4D_16u8:
vmov d18, r0, r1 @ v16qi
vmov d19, r2, r3
vld1.64 {d16-d17}, [sp:64]
vabd.u8 q8, q9, q8
vmull.u8 q9, d16, d16
vmull.u8 q8, d17, d17
vuzp.32 q9, q8
vpaddl.u16 q9, q9
vmov q10, q9 @ v4si
vpadal.u16 q10, q8
vmov r0, r1, d20 @ v4si
vmov r2, r3, d21
bx lr
arm-none-eabi-gcc -march=armv7-a -mfpu=neon -mfloat-abi=hard -O2 -mthumb:
sqrlen4D_16u8:
vabd.u8 q1, q0, q1
vmull.u8 q0, d2, d2
vmull.u8 q8, d3, d3
vuzp.32 q0, q8
vpaddl.u16 q0, q0
vpadal.u16 q0, q8
bx lr
So code generation seems to be OK for hard-float ABI but the soft-float version
has some issues with an extra vmov between the vpaddl and vpadal.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (6 preceding siblings ...)
2013-05-28 19:30 ` mgretton at gcc dot gnu.org
@ 2014-01-22 12:19 ` ktkachov at gcc dot gnu.org
2014-01-22 12:19 ` StaffLeavers at arm dot com
` (5 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2014-01-22 12:19 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
ktkachov at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ktkachov at gcc dot gnu.org
--- Comment #8 from ktkachov at gcc dot gnu.org ---
> arm-none-eabi-gcc -march=armv7-a -mfpu=neon -mfloat-abi=softfp -O2 -mthumb:
> sqrlen4D_16u8:
> vmov d18, r0, r1 @ v16qi
> vmov d19, r2, r3
> vld1.64 {d16-d17}, [sp:64]
> vabd.u8 q8, q9, q8
> vmull.u8 q9, d16, d16
> vmull.u8 q8, d17, d17
> vuzp.32 q9, q8
> vpaddl.u16 q9, q9
> vmov q10, q9 @ v4si
> vpadal.u16 q10, q8
> vmov r0, r1, d20 @ v4si
> vmov r2, r3, d21
> bx lr
With current trunk I'm getting for the softfp case:
push {lr} @ 40 *push_multi [length = 2]
vmov d16, r0, r1 @ v16qi @ 37 *neon_movv16qi/6 [length
= 8]
vmov d17, r2, r3
add lr, sp, #4 @ 36 *arm_addsi3/5 [length = 4]
vldr d18, [sp, #4] @ 3 *neon_movv16qi/4 [length = 8]
vldr d19, [sp, #12]
vabd.u8 q9, q8, q9 @ 7 neon_vabdv16qi [length = 4]
vmull.u8 q8, d18, d18 @ 14 neon_vmullv8qi [length = 4]
vmull.u8 q9, d19, d19 @ 16 neon_vmullv8qi [length = 4]
vuzp.32 q8, q9 @ 18 *neon_vuzpv4si_insn [length = 4]
vpaddl.u16 q8, q8 @ 22 neon_vpaddlv8hi [length = 4]
vpadal.u16 q8, q9 @ 28 neon_vpadalv8hi [length = 4]
vmov r0, r1, d16 @ v4si @ 39 *neon_movv4si/5 [length = 8]
vmov r2, r3, d17
ldr pc, [sp], #4 @ 45 *ldr_with_return [length = 4]
The move between the vpad*s is gone, but there's a couple of redundant loads
and some register spillage.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (7 preceding siblings ...)
2014-01-22 12:19 ` ktkachov at gcc dot gnu.org
@ 2014-01-22 12:19 ` StaffLeavers at arm dot com
2014-01-22 12:20 ` StaffLeavers at arm dot com
` (4 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: StaffLeavers at arm dot com @ 2014-01-22 12:19 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
--- Comment #9 from StaffLeavers at arm dot com ---
greta.yorsh no longer works for ARM.
Your email will be forwarded to their line manager.
Please do not reply to this email.
If you need more information, please email real-postmaster@arm.com
Thank you.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (8 preceding siblings ...)
2014-01-22 12:19 ` StaffLeavers at arm dot com
@ 2014-01-22 12:20 ` StaffLeavers at arm dot com
2014-01-22 12:21 ` StaffLeavers at arm dot com
` (3 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: StaffLeavers at arm dot com @ 2014-01-22 12:20 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
--- Comment #10 from StaffLeavers at arm dot com ---
greta.yorsh no longer works for ARM.
Your email will be forwarded to their line manager.
Please do not reply to this email.
If you need more information, please email real-postmaster@arm.com
Thank you.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (9 preceding siblings ...)
2014-01-22 12:20 ` StaffLeavers at arm dot com
@ 2014-01-22 12:21 ` StaffLeavers at arm dot com
2014-01-22 12:22 ` StaffLeavers at arm dot com
` (2 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: StaffLeavers at arm dot com @ 2014-01-22 12:21 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
--- Comment #11 from StaffLeavers at arm dot com ---
greta.yorsh no longer works for ARM.
Your email will be forwarded to their line manager.
Please do not reply to this email.
If you need more information, please email real-postmaster@arm.com
Thank you.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (10 preceding siblings ...)
2014-01-22 12:21 ` StaffLeavers at arm dot com
@ 2014-01-22 12:22 ` StaffLeavers at arm dot com
2014-01-22 12:22 ` StaffLeavers at arm dot com
2014-06-13 15:38 ` christophe.lyon at st dot com
13 siblings, 0 replies; 15+ messages in thread
From: StaffLeavers at arm dot com @ 2014-01-22 12:22 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
--- Comment #12 from StaffLeavers at arm dot com ---
greta.yorsh no longer works for ARM.
Your email will be forwarded to their line manager.
Please do not reply to this email.
If you need more information, please email real-postmaster@arm.com
Thank you.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (11 preceding siblings ...)
2014-01-22 12:22 ` StaffLeavers at arm dot com
@ 2014-01-22 12:22 ` StaffLeavers at arm dot com
2014-06-13 15:38 ` christophe.lyon at st dot com
13 siblings, 0 replies; 15+ messages in thread
From: StaffLeavers at arm dot com @ 2014-01-22 12:22 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
--- Comment #13 from StaffLeavers at arm dot com ---
greta.yorsh no longer works for ARM.
Your email will be forwarded to their line manager.
Please do not reply to this email.
If you need more information, please email real-postmaster@arm.com
Thank you.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (12 preceding siblings ...)
2014-01-22 12:22 ` StaffLeavers at arm dot com
@ 2014-06-13 15:38 ` christophe.lyon at st dot com
13 siblings, 0 replies; 15+ messages in thread
From: christophe.lyon at st dot com @ 2014-06-13 15:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
christophe.lyon at st dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |christophe.lyon at st dot com
--- Comment #14 from christophe.lyon at st dot com ---
As of current trunk the softfp case looks like this:
sqrlen4D_16u8:
vmov d16, r0, r1 @ v16qi
vmov d17, r2, r3
vld1.64 {d18-d19}, [sp:64]
vabd.u8 q9, q8, q9
vmull.u8 q8, d18, d18
vmull.u8 q9, d19, d19
vuzp.32 q8, q9
vpaddl.u16 q8, q8
vpadal.u16 q8, q9
vmov r0, r1, d16 @ v4si
vmov r2, r3, d17
bx lr
which looks quite good.
^ permalink raw reply [flat|nested] 15+ messages in thread