From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14748 invoked by alias); 22 Jun 2012 22:46:47 -0000 Received: (qmail 14738 invoked by uid 22791); 22 Jun 2012 22:46:44 -0000 X-SWARE-Spam-Status: No, hits=-4.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,KHOP_THREADED,TW_CP X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 22 Jun 2012 22:46:32 +0000 From: "hubicka at ucw dot cz" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/53726] [4.8 Regression] aes test performance drop for eembc_2_0_peak_32 Date: Fri, 22 Jun 2012 22:46:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: hubicka at ucw dot cz X-Bugzilla-Status: RESOLVED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 4.8.0 X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-06/txt/msg01546.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726 --- Comment #22 from Jan Hubicka 2012-06-22 22:45:35 UTC --- > Yes. The question is what is "very small" and how can we possibly As what is very small is defined in the i386.c in the cost tables. I simply run a small benchmark testing library&GCC implementations to fill it in. With new glibcs these tables may need upating. I updated them on some to make glibc in SUSE 11.x. PR 43052 is about memcmp. Memcpy/memset should behave more or less sanely. (that also reminds me that I should look again at the SSE memcpy/memset implementation for 4.8) > detect "very small". For this testcase we can derive an upper bound > of the size, which is 8, but the size is not constant. I think unless > we know we can expand the variable-size memcpy with, say, three > CPU instructions inline there is no reason to not call memcpy. > > Thus if the CPU could do > > tem = unaligned-load-8-bytes-from-src-and-ignore-faults; > mask = generate mask from size > store-unaligned-8-bytes-with-maxk > > then expanding the memcpy call inline would be a win I suppose. > AVX has VMASKMOV, but I'm not sure using that for sizes <= 16 > bytes is profitable? Note that from the specs > of VMASKMOV it seems the memory operands need to be aligned and > the mask does not support byte-granularity. > > Which would leave us to inline expanding the case of at most 2 byte > memcpy. Of course currently there is no way to record an upper > bound for the size (we do not retain value-range information - but > we of course should). My secret plan was to make VRP produce value profiling histogram when value is known to be with small range. Should be quite easy to implement. Honza