From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 108650 invoked by alias); 8 Mar 2017 19:41:37 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 108600 invoked by uid 89); 8 Mar 2017 19:41:36 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: =?ISO-8859-1?Q?No, score=-2.0 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=HTo:U*matz, in=c2, graphite, need?= X-HELO: NAM03-CO1-obe.outbound.protection.outlook.com Received: from mail-co1nam03on0076.outbound.protection.outlook.com (HELO NAM03-CO1-obe.outbound.protection.outlook.com) (104.47.40.76) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 08 Mar 2017 19:41:34 +0000 Authentication-Results: suse.de; dkim=none (message not signed) header.d=none;suse.de; dmarc=none action=none header.from=caviumnetworks.com; Received: from sellcey-dt.caveonetworks.com (50.233.148.156) by BY2PR07MB2438.namprd07.prod.outlook.com (10.166.115.18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.947.12; Wed, 8 Mar 2017 19:41:32 +0000 Message-ID: <1489002090.22552.19.camel@caviumnetworks.com> Subject: Re: SPEC 456.hmmer vectorization question From: Steve Ellcey To: Michael Matz Cc: gcc@gcc.gnu.org, law@redhat.com Date: Wed, 08 Mar 2017 19:41:00 -0000 In-Reply-To: References: <201703062237.v26MbW5e008866@sellcey-dt.caveonetworks.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-ClientProxiedBy: CY4PR06CA0027.namprd06.prod.outlook.com (10.173.44.13) To BY2PR07MB2438.namprd07.prod.outlook.com (10.166.115.18) X-MS-Office365-Filtering-Correlation-Id: 6f130e9e-111f-4c2c-9086-08d4665b1f8d X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:BY2PR07MB2438; X-Microsoft-Exchange-Diagnostics: 1;BY2PR07MB2438;3:gPB+VyfpZ5sXGeW+OGircQyOdB4ZqXKSWbzCZgHxVzOPdljDbeJj8f9or38mmdxPM3ocWMR1NB2TY8KsCU5spnDVn9hkH92Mh+ngB0FsygIGuR+EMwanITR91prI3XVPOJU1XIe03G2V4szeQwKT5EkiCX5r5iz66QK4Cg0qaHO/cRTA1Su2Ov+JV3dhtpT2yqqKmdfoYjbhX1TXHWiIuYFK1NGJKxbo9bXgaZJZ98ZzhjNzUxmOf7kDSmVJLCRzFUN4lD9ab2IscVgIqheYiA==;25:bP56MmpVBcf4iJnb942Fs4jPy0ftokgRl5xRwVG0f8PD/utBJsXiLYJk56H/cV/4h2d1y+DPPRtWGb++60dEHG7kpWZL9q1cmwoM/ORdipA1zH8XUC/rGWHxYroo0HxYlFxT45sHz8gIlCl/9ewSpCK8IZt/3IZOvLAd5u5k7C4SERUX9XYVXvHU0r/04gyBhiBISPjSF9s/WFeIreUVBdBQXF9p2hQg8fSQMVEnoy2xs+iQVLA/5cfQnngaPeW4M12cVYLPUw/P3gm9df7/rV34CiZa3xQXAZ4jvMKKqHFrPl5WAXCk8J8eruhwgkY2HJoTffdz+ilEuXR3ISSyFHzG0l+4v9hUnFe76TBVqVVPIXAxsdRIrcIuQexVePeMNfG8An1V9QQOZVh0aT2xAM1QZ/yCRFmuZn9Q0YWG9UClDYdXRW0HnLljYFRI7LLQIQKTSNABDIDdUO2IjsTcSA== X-Microsoft-Exchange-Diagnostics: 1;BY2PR07MB2438;31:1MMPi7wVhP82/NlW2T8DGkAXQY3ek0hbmBIjGoRtvwEaCgKwgnhHr3Kq0giJ1ovUc7d2r6X+QiniYZTKSCo/Fh1e1PHmwC1SzePOk9UFahZ4W+16eL16kAgy8sx7BtmZ0AysmfBInVwsXPL2GIgZvBYPGQwSkrfynG66tpM7FG0a629biiyTUG7diYkP5g0OVGX7ttXL/b8Jtmu6c+odKGpkdQq6Y46at5aK/h11qM+WS5UbxTtMWzP2UAnzG4GVsTARQqHFUafHq5tliLUTLw==;20:w5bjzfQxaKWxL+ryUscBffdK5ZQK9wGrggJwlrnjXpttaYpUiH1zVpfvfLv/vPdB5qDfzuKAxazgmUJOXhcpRDfyWi2ExqcAQX1Yet2f4svo8zOxt254yYkbRl5yRc6eA1D9ws2nWS6VCvLtdc07NvaIqSmD4eqgwDrUCZnX0kblXoY1dUy2t7/7F53d4kUnirBVlJejuxiT3Bs4pHKs7yC25MO1pU6SMteJwWPzyP6+mJ4Kf6ITIUamI+yOhHg0IO15+kHuZeyNEhDIVcpPvAy/hPPnU/JPUkdRdqnUzcm+sEQQS7UE2kHxckyWo07PistvDwa0t+1/NwX9eEcS6jcqSQYQwQHqoZfcSr/NbuKNpYkF5zLNPmAZh5GFfxQ1i5cyi3B0jvYEM7Ii2+Mc6QO7gRKCA++mTvd+opvM4JqIuWgra2jFgUI1wRyA9H1qs4UQWfbJYnDEU25ncSz0L6u4+7IAjZKA1I16Ml2UqoKHP6m3o5ZCiyO/ZjC9pBwgWBA9ewUWQHpTzQ/om87hayTRxIWyLq6r85+UuGYp/fLVJUOv5za7uuSAQ/sNI6xVdnvyYWn+C/m/gswLvF/pfsdwhwLmXQ4lgOOL7MFW2w4= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(22074186197030)(183786458502308); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040375)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(6041248)(20161123564025)(20161123555025)(20161123560025)(20161123562025)(20161123558025)(6072148);SRVR:BY2PR07MB2438;BCL:0;PCL:0;RULEID:;SRVR:BY2PR07MB2438; X-Microsoft-Exchange-Diagnostics: 1;BY2PR07MB2438;4:urcx4uM4z+K6JmqsL8o87svAEsYNjkfLieGiNwvS/mKXA7yN/PtCBBfEG9wG7qZZYSyUagaH//uJG0A6/eZyDL6m1qM7gFvh16fK3Wh+a/JLdzOqBidKY25nA71OWS6TXaO4nadiVbhbUkxFeXV/S/+/dObvNeI2WJN/C8/6cp+/p4VQTBCTj0zVZX3Uw5ZndYegtWzP64KIGB8+8/t9OloSPQMYXxHfhIMTUt+ZAPAkyFK4Iktv5pxAI/x6oovl5YOeEND3//DTb5Kemy+5vUqdj4x3Zhlw3jQmSPsElz6fsdhNUqjm/jVFSqGLRCk7hjo6XL3X0IsTHEupB5l8jsfzuA2/1gy+dg/X4EJufAVnEjxpzp0Yi3b3gng5lAwKF+7rBGs0N2qT7vBKA3oQh2EvFfoZ01zRsrloOX4kRbkpNV1FjPJ5G+yuU80rEiNsCDk4jNhaCLhZPuqa+lImw0XsUELF9ETmVZRIml3Pu0BUie/QZ+cxU3zNsMvJA3I18hEpddGkJgOY28oZqGPXJ9JGtkn7tRo1YSUGu43H3p0N3i7tp4gfYZ3eqJqsPW+SI2c39mpeJFMSbOahql+ISorXSqZyT/wgIND0qU2gIGfOvRaIUCcYtYLTfjBlmfBTFI/8cq1i/0+lqh6QBxhU4oJoq69zLlepW4ZEVgbajvRLVGeEyzjteH1y2wC/pAf5 X-Forefront-PRVS: 02408926C4 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(4630300001)(6009001)(39410400002)(39450400003)(39830400002)(24454002)(377424004)(6506006)(229853002)(6306002)(6116002)(50226002)(103116003)(189998001)(6512007)(6246003)(5660300001)(3846002)(81166006)(8676002)(33646002)(2950100002)(25786008)(6916009)(6486002)(53936002)(42882006)(50466002)(36756003)(5820100001)(2870700001)(23676002)(2906002)(53416004)(4326008)(38730400002)(66066001)(47776003)(305945005)(76176999)(110136004)(7736002)(42186005)(50986999);DIR:OUT;SFP:1101;SCL:1;SRVR:BY2PR07MB2438;H:sellcey-dt.caveonetworks.com;FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtCWTJQUjA3TUIyNDM4OzIzOk1IanEyNWIybFNLMkZOZWdTeGtPNGlIM0ZW?= =?utf-8?B?Z1ViNnBNNXBaZ2Y0YmZIOXZ1V3EzUGl5VE9ieTZsTWVNeUhMRG9PTnprSFFu?= =?utf-8?B?SEhTTkpRUFNPVWJTRjBIbnVyVkZZeG9QY29LWXJ4ZXk2NzFjOXRvbzlOWVJ2?= =?utf-8?B?Sno5M3F0VHZLVTlwZFBCMk51Yks5akh5UHFJSyttRUpkVHZmM252QVpIOGFO?= =?utf-8?B?eDJXNUxjYUZqZkVoSHFNL2hncGVDZktqeG5qRGNRWUpMQjFCaTlsVFlZTk5O?= =?utf-8?B?WnA1dlVZTXFtbVZsRnptdDJsQ3BoZHlndTg3VFJHSDVCNmFOeEl3aXFoTTlp?= =?utf-8?B?NlFaWTFkclZ3SnZjbFJjUzRYaERoclIwQSs4YlJrOEZnL2o5YzNuVGUwZU1W?= =?utf-8?B?SVNMSjdKMVUrbWxPNXpVaWg4blFRRUNrQ3Y0bTJlN2t3YTlFWW0xMHU3V2pu?= =?utf-8?B?Wko0V2VaM2JsemJldGNYclZHT3I5bitnNUhzcDBNcFJyVHJBaXRQZ0J0d0wy?= =?utf-8?B?cnVGeHBNZlhpdEFwYWlaZ1JyRkdNek9FZ0lZZlQ0N1I3T25lbXQ2dlRnNlZU?= =?utf-8?B?aHJJNmc3ejM3US9Sck8rOEdIOWhscDcxMVp0UnIxdVFNT1BlMDBkazdOaWZQ?= =?utf-8?B?SXc2TzRHZ1hnK1NxMEZ1N2FrSFltelo4cEVTcFZMaGQ4bE5MS29xUHFiQ0JR?= =?utf-8?B?QUlvbUJlZkNVRzZ5NDVMMm5nZzYxTHFhVUR6RDlVdUZPQjhxbmZQeUtXWnE1?= =?utf-8?B?dllWY1lyWkw1UlJhSE5Sa2xQK2NBaVdzYWF3TDMvaGM0eWloZzhibGk4andZ?= =?utf-8?B?S3hBejI3emQySDNld2huOVpVM3hOWnRyaDBkS1NLWTN2anA4VitycmRaTndz?= =?utf-8?B?MWNJMGtHVzl5U0VmNEwwZGdaMXQ2Mm93VkFSWkZoU2pkcGkyUjFSdXJFY3lJ?= =?utf-8?B?RldqdHdqWHgwazBYSlBVdTE5VUhmTjlJa0dEU21aSjJRSDcycm5CcjJ6aXl0?= =?utf-8?B?emc0aG1Ia3Avd3hHS0VGU0gxckl5S3dDbmJtaC81NUNHZXRhUVRaWFZtams4?= =?utf-8?B?K3NvY0dlSFgxb2tEYTloa0E4Y0pJeWR3cU9jM09Dbi90SWdGTnNTMkRieThR?= =?utf-8?B?bXVpbFBINFplTmxMR3kxM1VNTnlBMnJmUEdIL2VMSGRzc2NydThqV2hVNnZr?= =?utf-8?B?QVYraFVWWlhGL1QxZXUrTjZJeC9TUlFuejVUZ2ZUY0tJUjZoVEVsRGxzcXM3?= =?utf-8?B?Z3JnQVV3UnRVNnl1RDlxWUMvTHdPY0lTMDJ5TDVsWmFtWVRzVHNVV0FyWm82?= =?utf-8?B?TXYraXFqdTZIMkE3SU13Zmg5NnZrQ2ZqK2N5T2dkUHdZOHFVQndwWmlReVNU?= =?utf-8?B?RG41NUl3UnRwL0JSMUlCSE9pVFdpS2phR0FEdTRJR0RyaUFHMmJnM29JQlVH?= =?utf-8?B?dTJJU0N5RTRIV2tvTzVNOUQwM3pBcG5WT0ZQWGpLd0l5VXl5SEwrVGZEUnls?= =?utf-8?B?RHRjUT09?= X-Microsoft-Exchange-Diagnostics: 1;BY2PR07MB2438;6:nnMz/e7+EJFMnX5Sucsl0+/j2jz0x8hiAHztCzlwFyRIY9lwMwHBTj4c/oo8N0A/XanQ+5llblJv5+OzzD96gNThUQjXnRd+rsDGh+yCbUEk3wokZPpwyLltHFizD5sRSXHrhECR1uT8g6/3TJZfnB6Itzg/KINcWy1r7hd3vH0OMsjErffKKwDzjJyrAPvJvxeOagBP/Xb5S0qqmAZkKnNASI5SI0ff3JLV3Bn8t9Xqc1WyYrRNytKBe18DAzy3Di4JrYgWyiQ2FzUnct4SeWnf1pw8OdgN7b8z51eZxcFVTbPFTIqsOajcxhnTJCw+8pAfXN14g8UtM9M/FBI+KncFvRyoxyuytXlZspe8RNg33Dd23WmjZbna+2g2EV+ju/gtmX6f0piXrRI/mGy2fA==;5:9d41D4BOYd8+JqUSCtFbO0x5SiLz3IT3mbK5edOcEosFF2CSPkpkX/JD+ssOdtXMCkPZgLVwBMPPDy8BH10irIx8vgUh4RcRZSeoRVP8SH9yDVS3udJhbvNeAI6jMjrT7ffjuEGDZVszTDZ9aCUHGg==;24:kRHK0TTE75ekYk+p9iiujkAe3Hc4eqfs299q7ZXpduMMUayER6XbOsocjk+/cxbqPp2VdvKRdtNz+TzUHFu3PaECnaKKYMBvxUmHhVCnits= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;BY2PR07MB2438;7:SWZSHTnNMkfg2v15LGOQilbj3Owrv0Xo5hdWu6gA1ns8TfHUv+Lk7pgCNL94b3AfddeQ3k+EoP6+Aw740v0ieGYO6f/5EP/ecHohkSnu9dl9z5P4IK/8NWSI8X3JJT8+e9LkhluBr3rbFuQ8zOeXrIQLKVATAqs698aX3d4bJK2nee1ZXGXs+SXlIFg4LOSjd7NPl+lXw0JKV2yMbYf4DhBHmLtYzzOSfcRihRo0lyicWDKA/pf6IEy0WI8mIiiJQ8qCgdG6JmAcR3FHYlPlJrTWbaOfgoycFObL7NgPUKfLgdz+jZRjze7JbaeUYzvmB8bvcwMoiy6Y/r+TtTa8SA== X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Mar 2017 19:41:32.0229 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR07MB2438 X-SW-Source: 2017-03/txt/msg00018.txt.bz2 On Tue, 2017-03-07 at 14:45 +0100, Michael Matz wrote: > Hi Steve, > > On Mon, 6 Mar 2017, Steve Ellcey wrote: > > > > > I was looking at the spec 456.hmmer benchmark and this email string > > from Jeff Law and Micheal Matz: > > > >   https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01970.html > > > > and was wondering if anyone was looking at what more it would take > > for GCC to vectorize the loop in P7Viterbi. > It takes what I wrote in there.  There are two important things that need  > to happen to get the best performance (at least from an analysis I did in  > 2011, but nothing material should have changed since then): I guess I was hoping that some progress had been made since then, but it sounds like it hasn't. > (1) loop distribution to make some memory streams vectorizable (and leave  >     the others in non-vectorized form). > (1a) loop splitting based on conditional (to remove the k > (2) a predictive commoning (or loop carried store reuse) on the dc[]  >     stream > > None of these is valid if the loop streams can't be disambiguated, and as  > this is C only adding explicit restrict qualifiers would give you that, or  > runtime disambiguation, like ICC is doing, that's part (0). So it sounds like the loop would have to be split up using runtime disambiguation before we could do any of the optimizations.  Would that check and split be something that could or should be done using the graphite framework or would it be a seperate pass done before the graphite phase is called?  I am not sure how one would determine what loops would be worth splitting and which ones would not during such a phase. Steve Ellcey sellcey@cavium.com