From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21201 invoked by alias); 12 Jul 2011 21:24:00 -0000 Received: (qmail 21190 invoked by uid 22791); 12 Jul 2011 21:23:59 -0000 X-SWARE-Spam-Status: No, hits=-0.7 required=5.0 tests=AWL,BAYES_50,RCVD_IN_DNSWL_LOW,TW_AV X-Spam-Check-By: sourceware.org Received: from ch1ehsobe004.messaging.microsoft.com (HELO ch1outboundpool.messaging.microsoft.com) (216.32.181.184) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 12 Jul 2011 21:23:45 +0000 Received: from mail58-ch1-R.bigfish.com (216.32.181.169) by CH1EHSOBE016.bigfish.com (10.43.70.66) with Microsoft SMTP Server id 14.1.225.22; Tue, 12 Jul 2011 21:23:43 +0000 Received: from mail58-ch1 (localhost.localdomain [127.0.0.1]) by mail58-ch1-R.bigfish.com (Postfix) with ESMTP id 598B218201DE for ; Tue, 12 Jul 2011 21:23:43 +0000 (UTC) X-SpamScore: 4 X-BigFish: VPS4(z1039oz4015Lzz1202hzzz32i668h839h61h) X-Spam-TCS-SCL: 0:0 X-Forefront-Antispam-Report: CIP:163.181.249.109;KIP:(null);UIP:(null);IPVD:NLI;H:ausb3twp02.amd.com;RD:none;EFVD:NLI Received: from mail58-ch1 (localhost.localdomain [127.0.0.1]) by mail58-ch1 (MessageSwitch) id 1310505823181942_9118; Tue, 12 Jul 2011 21:23:43 +0000 (UTC) Received: from CH1EHSMHS014.bigfish.com (snatpool1.int.messaging.microsoft.com [10.43.68.250]) by mail58-ch1.bigfish.com (Postfix) with ESMTP id 1E2F3700050 for ; Tue, 12 Jul 2011 21:23:43 +0000 (UTC) Received: from ausb3twp02.amd.com (163.181.249.109) by CH1EHSMHS014.bigfish.com (10.43.70.14) with Microsoft SMTP Server id 14.1.225.22; Tue, 12 Jul 2011 21:23:37 +0000 X-M-MSG: Received: from sausexedgep01.amd.com (sausexedgep01-ext.amd.com [163.181.249.72]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by ausb3twp02.amd.com (Axway MailGate 3.8.1) with ESMTP id 25F27C81F3 for ; Tue, 12 Jul 2011 16:23:32 -0500 (CDT) Received: from sausexhtp02.amd.com (163.181.3.152) by sausexedgep01.amd.com (163.181.36.54) with Microsoft SMTP Server (TLS) id 8.3.106.1; Tue, 12 Jul 2011 16:24:31 -0500 Received: from sausexmb1.amd.com (163.181.3.156) by sausexhtp02.amd.com (163.181.3.152) with Microsoft SMTP Server id 8.3.83.0; Tue, 12 Jul 2011 16:23:35 -0500 Received: from gccpike4.amd.com ([10.236.44.242]) by sausexmb1.amd.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 12 Jul 2011 16:22:02 -0500 From: To: , , , , , CC: Message-ID: <20110712212201.23194.45716.sendpatchset@gccpike4.amd.com> Subject: AVX generic mode tuning discussion. Date: Tue, 12 Jul 2011 22:26:00 -0000 MIME-Version: 1.0 Content-Type: text/plain X-OriginatorOrg: amd.com Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-07/txt/msg00968.txt.bz2 We would like to propose changing AVX generic mode tuning to generate 128-bit AVX instead of 256-bit AVX. As per H.J's suggestion, we have reviewed the various tuning choices made for generic mode with respect to AMD's upcoming Bulldozer processor. At this moment, this is the most significant change we have to propose. While we are willing to re-engineer generic mode, this feature needs immediate discussion since the performance impact on Bulldozer is significant. Here is the relative CPU2006 performance data we have gathered using gcc on AMD Bulldozer (BD) and Intel Sandybridge (SB) machines with "-Ofast -mtune=generic -mavx". %gain/loss avx256 vs avx128 (negative % indicates loss positive % indicates gain) AMD BD Intel SB 410.bwaves -2.34 -1.52 416.gamess -1.11 -0.30 433.milc 0.47 -1.75 434.zeusmp -3.61 0.68 435.gromacs -0.54 -0.38 436.cactusADM -23.56 21.49 437.leslie3d -0.44 1.56 444.namd 0.00 0.00 447.dealII -0.36 -0.23 450.soplex -0.43 -0.29 453.povray 0.50 3.63 454.calculix -8.29 1.38 459.GemsFDTD 2.37 -1.54 465.tonto 0.00 0.00 470.lbm 0.00 0.21 481.wrf -4.80 0.00 482.sphinx3 -10.20 -3.65 SpecINT -3.29 1.01 400.perlbench 0.93 1.47 401.bzip2 0.60 0.00 403.gcc 0.00 0.00 429.mcf 0.00 -0.36 445.gobmk -1.03 0.37 456.hmmer -0.64 0.38 458.sjeng 1.74 0.00 462.libquantum 0.31 0.00 464.h264ref 0.00 0.00 471.omnetpp -1.27 0.00 473.astar 0.00 0.46 483.xalancbmk 0.51 0.00 SpecFP 0.09 0.19 As per the data, the 1% performance gain for Intel Sandybridge on SpecFP is eclipsed by a 3% degradation for AMD Bulldozer. For the data above, generic mode splits both 256-bit misaligned loads and stores, as is currently the case in trunk. Even if we disable 256-bit misaliged load splitting, AVX 256-bit performance improves only by ~1.4% on SpecFP for AMD Bulldozer. On the other hand, AVX 256-bit performance drops by 0.12% on Intel Sandybridge. In this case with AVX 256 load splitting disabled, a cumulative 0.9% performance gain for Intel Sandybridge is reflected versus a 1.9% loss for AMD Bulldozer comparing AVX 256 to AVX 128 and hence AVX 256 is still not a fair choice for generic mode. Please provide thoughts. It would be great if HJ can verify Intel Sandybridge data. Thanks, Harsha