From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7115 invoked by alias); 19 Aug 2008 22:36:58 -0000 Received: (qmail 7106 invoked by uid 22791); 19 Aug 2008 22:36:57 -0000 X-Spam-Check-By: sourceware.org Received: from rv-out-0708.google.com (HELO rv-out-0708.google.com) (209.85.198.245) by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 19 Aug 2008 22:36:22 +0000 Received: by rv-out-0708.google.com with SMTP id c5so152991rvf.56 for ; Tue, 19 Aug 2008 15:36:20 -0700 (PDT) Received: by 10.114.131.9 with SMTP id e9mr7324217wad.200.1219185380351; Tue, 19 Aug 2008 15:36:20 -0700 (PDT) Received: by 10.114.94.19 with HTTP; Tue, 19 Aug 2008 15:36:20 -0700 (PDT) Message-ID: Date: Tue, 19 Aug 2008 23:20:00 -0000 From: "Scott Dillard" To: gcc-help@gcc.gnu.org Subject: sse vector extensions, unions and inlining MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_8575_12045858.1219185380347" X-Google-Sender-Auth: 90480d92bca80b06 Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-help-owner@gcc.gnu.org X-SW-Source: 2008-08/txt/msg00185.txt.bz2 ------=_Part_8575_12045858.1219185380347 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Content-length: 2252 Hi, I've been messing around with the gcc vector extensions (sse) and the assembly produced seems somewhat suboptimal. I'm not sure what "optimal" is so I'm inquring here first, before filing a bug report. This concerns inlined functions that return vectors using using the struct/union return convention, that is, the address where the result is to be stored is passed as a hidden first argument to the callee. When the function returns a 'raw' vector type (such as "double foo __attribute__((vector_size(16) ))") that fits in a single mmx register then the result of the call to the inlined function is the same as manual inlining. However if a union is returned (such as "union { double a[2]; double v __attribute((vector_size(16))); } ") or if the vector type is too big for a register (such as "double foo __attribute__((vector_size(32)))") then excessive stack shuffling occurs, relative to manual inlining. This is C, btw, so I understand that, in general, stack space has to be reserved for the arguments (as opposed to const&) but I would expect that after inlining, the optimizer could see that the arguments are not modified and not bounce them through the stack, as it does for things like int and double. Lets say there's functions f(a,b) = a+b, g(a,b) = a*b, and h(a,b) = g(f(a,a),f(a,b)). Functions f and g are inlined into h, but the body of h looks like this: reserve stack space for what would have been calls to f and g copy arguments into that space load from that space into mmx registers operate copy from mmx registers into stack space copy from stack space into the space pointed to by the hidden "return here" argument. If h is defined as (a*a)+(a*b) then this stack shuffling does not happen. Is this asking too much? Is there some fundamental reason why the arguments to the inlined function need to be bounced through the stack? This is with gcc 4.1.3. I've attached a test file and resulting assembly. The difference is pretty striking, though I've not benchmarked it. I'm also aware this is not really the best way to use sse (better to put each vector component in a separate array and vectorize the loop) but I think maybe the issue is with inlined functions that return structs/unions in general. Thanks, Scott ------=_Part_8575_12045858.1219185380347 Content-Type: application/x-gzip; name=vec_test.tar.gz Content-Transfer-Encoding: base64 X-Attachment-Id: f_fk33hpa90 Content-Disposition: attachment; filename=vec_test.tar.gz Content-length: 1131 H4sIAJJFq0gAA+1WW2+bMBTOK/4VVtRJ0BGKHYfSVpMq9aFv29Oepini4mRo BFBsULaq/302lwAOTdWHapvk76Gcfj7+zsXmkIpGa04Zd6LZu8EV8AiRT3S9 codPCYJX1zOE3BX2hIm8mYswwngG3fdLqUfJeLCHcMainPMzfq+t/6cosyTP YEUjHIMnCKBAnJdhSiH9hr/fDYkKrtcB5/skLDldr01T7OL5fs2S39REnmVZ d+D5DgD+q6Ax3cCBdPNXrIlm8ySCSZYmGQXNGgji2Gz9Ars1QmiBpzr6nvJy n0Gz5q0np/oUONXH0KlErOdJRbAr07cKXh4FG4lNnqPXNWTmMlhgB5bdGGHd BjDQwW/ORRY4yglcXQKwjSK4+ILhYscYxdBcYhgm3IYekU+4TSrKIEt2SSou 9J6yMuXMEnsNZ5Ok1JhXx1d9LjhOD1w8ChykyTaDxLbRCjjbNA9TKIuXLuIg DWnb8H5TZhEX51k35hYYRcl+pMYHGhbA2OWVNFlhw4boFuOkt9nADg/AYOJO GReIuHIT61SIa0oJS5LxoSUxGZBJR/o9yTpyiXsyPBwziw82XPjthi4SmYi0 9HoyGuxPxP5rPNo/CB/0nkx6euNIXXwyju9PxJdZSwGiCkhZ5Ko0a2h3MtyY XeGXw914k+GwqtuEQyfeUuNaZWtlhLBCR3V9qnLN3gx9gyI2ji0XSR92O1cs lKnkfV/lBzLI9RWd2kmsEFcNcGypGuFY/DDEUGs1rvf4v9LfujdoqTpPH0br PG7Z4oWL0jhjX3GeutVHZ6I4n75sTVcGr8Gov43MSklwea50oiS4PFc6URLE 50rvT1NM4Trt1cl1GZ2ZKd/Uxn08eYq86KZSa9bDqjHj3pSjTQxs44KI4Si/ eu1wdBbtwHxpmOLBMMXKMMWvDdNmUPqjMXk6fGSd+OQyj+qXD9Td8CmOqHe+ aazi2muivqcpDSo60R3cdkd2IIlpxo3548PDLTQfP3+1IHGWDpIfI0brfhhO lnPqiLWF+KxHP+353L4v9vlWfN8YAJdX4G//XNLQ0NDQ0NDQ0NDQ0NDQ0NDQ 0NDQ0NDQ0NDQ0ND4Z/EH6tSEuwAoAAA= ------=_Part_8575_12045858.1219185380347--