From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 96337 invoked by alias); 7 Aug 2015 20:24:21 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 96230 invoked by uid 89); 7 Aug 2015 20:24:20 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-wi0-f170.google.com Received: from mail-wi0-f170.google.com (HELO mail-wi0-f170.google.com) (209.85.212.170) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 07 Aug 2015 20:24:18 +0000 Received: by wibxm9 with SMTP id xm9so79473497wib.1 for ; Fri, 07 Aug 2015 13:24:15 -0700 (PDT) X-Received: by 10.180.11.176 with SMTP id r16mr9631267wib.87.1438979055419; Fri, 07 Aug 2015 13:24:15 -0700 (PDT) Received: from [10.161.25.19] ([176.2.53.25]) by smtp.gmail.com with ESMTPSA id bi6sm16210041wjc.25.2015.08.07.13.24.13 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 07 Aug 2015 13:24:14 -0700 (PDT) User-Agent: K-9 Mail for Android In-Reply-To: <20150807135033.GA26612@tsaunders-iceball.corp.tor1.mozilla.com> References: <20150805105650.GA27755@tsaunders-iceball.corp.tor1.mozilla.com> <1438788499.21752.39.camel@surprise> <1438788868.21752.45.camel@surprise> <20150805202212.GA6847@tsaunders-iceball.corp.tor1.mozilla.com> <1438886869.21752.62.camel@surprise> <87io8sp5tn.fsf@googlemail.com> <20150807043113.GA24176@tsaunders-iceball.corp.tor1.mozilla.com> <87k2t7ig8a.fsf@e105548-lin.cambridge.arm.com> <20150807135033.GA26612@tsaunders-iceball.corp.tor1.mozilla.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Subject: Re: [PATCH 4/4] define ASM_OUTPUT_LABEL to the name of a function From: Richard Biener Date: Fri, 07 Aug 2015 20:24:00 -0000 To: Trevor Saunders ,David Malcolm ,tbsaunde+gcc@tbsaunde.org,GCC Patches ,richard.sandiford@arm.com Message-ID: <5EEE4B78-148C-4CA8-BF0F-8BBC33843493@gmail.com> X-IsSubscribed: yes X-SW-Source: 2015-08/txt/msg00407.txt.bz2 On August 7, 2015 3:50:33 PM GMT+02:00, Trevor Saunders wrote: >On Fri, Aug 07, 2015 at 10:45:57AM +0100, Richard Sandiford wrote: >> Trevor Saunders writes: >> > On Thu, Aug 06, 2015 at 08:36:36PM +0100, Richard Sandiford wrote: >> >> An integrated assembler or tighter asm output would be nice, but >when >> >> I last checked LLVM was usually faster than GCC even when >compiling to asm, >> >> even though LLVM does use indirection (in the form of virtual >functions) >> >> for its output routines. I don't think indirect function calls >themselves >> >> are the problem -- as long as we get the abstraction right :-) >> > >> > yeah, last time I looked (tbf a while ago) the C++ front end took >up by >> > far the largest part of the time. So it may not be terribly >important, >> > but it would still be nice to figure out what a good design looks >like. >> >> I tried getting final to output the code a large number of times. >> Obviously just sticking "for (i = 0; i < n; ++i)" around something >> isn't the best way of measuring performance (for all the usual >reasons) >> but it was interesting even so. A lot of the time is taken in calls >to >> strlen and in assemble_name itself (called by ASM_OUTPUT_LABEL). > >yeah, this data looks great. I find it interesting that you say we >spend so much time outputting labels as opposed to instructions. > >> Each time we call assemble_name we do: >> >> real_name = targetm.strip_name_encoding (name); >> >> id = maybe_get_identifier (real_name); >> if (id) >> { >> tree id_orig = id; >> >> mark_referenced (id); >> ultimate_transparent_alias_target (&id); >> if (id != id_orig) >> name = IDENTIFIER_POINTER (id); >> gcc_assert (! TREE_CHAIN (id)); >> } >> >> Doing an identifier lookup every time we output a reference to a >label >> is pretty expensive. Especially when many of the labels we're >dealing >> with are internal ones (basic block labels, debug labels, etc.) for >which >> the lookup is bound to fail. > >well, there's ASm_OUTPUT_INTERNAL_LABEL, and I think something similar >for debug labels. I guess we don't always use those where we could. >Or >maybe the problem is we have places where we need to look at data to >find out. Maybe it would make sense to have the generally used >output_label routine take a tree / rtx, and check if its a internal or >debug label and dispatch appropriately. > >> So if compile-time for asm output is a concern, that seems like a >good >> place to start. We should try harder to keep track of the identifier >> behind a name (when there is one) and avoid this overhead for >> internal labels. >> >> Converting ASM_OUTPUT_LABEL to an indirect function call was in the >> noise even with my for-loop hack. The execution time of the hook is >> dominated by assemble_name itself. I hope patches like yours aren't >> held up simply because they have the equivalent of a virtual >function. > >Well, I think it makes sense to reroll this series, but I think I'll >keep working on trying to replace these macros with something else. > >> Also, although we seem to be paranoid about virtual functions and >> indirect calls, it's worth remembering that on most targets every >> call to fputs(_unlocked), fwrite(_unlocked) and strlen is a PLT call. >> Our current code calls fputs several times for one line of assembly, >> including for short strings like register names. This is doubly >> inefficient because: >> >> (a) we could reduce the number of PLT calls by doing the buffering >> ourselves and > >yeah, I mentioned that earlier, but its great to have data showing its >a >win! I think its also probably important to enabling the other >optimizations below. > >> (b) the names of those registers are known at compile time (or at >least >> at start-up time) and are short, but we call strlen() on them >> each time we write them out. > >yeah, that seems like something that should be fixed, but I'm not sure >off hand where to look for the code doing this. > >> E.g. for the attached microbenchmark I get: >> >> Time taken, normalised to VERSION==1 >> >> VERSION==1: 1.000 >> VERSION==2: 1.377 >> VERSION==3: 3.202 (1.638 with -minline-all-stringops) >> VERSION==4: 4.242 (2.921 with -minline-all-stringops) >> VERSION==5: 4.526 >> VERSION==6: 4.543 >> VERSION==7: 10.884 >> >> where the results for 5 vs. 6 are in the noise. >> >> The 5->4 gain is by doing the buffering ourselves. The 4->3 gain is >for >> keeping track of the string length rather than recomputing it each >time. >> >> This suggests that if we're serious about trying to speed up the asm >output, >> it would be worth adding an equivalent of LLVM's StringRef that pairs >a >> const char * string with its length. > >I've thought a tiny bit about working on that, so its nice to have >data. Tree identifiers have an embedded length. So its all about avoidibg this target hook mangling the labels. Richard. >Trev > >> >> Thanks, >> Richard >> > >> #define _GNU_SOURCE 1 >> >> #include >> #include >> #include >> >> struct S >> { >> S () : end (buffer) {} >> >> ~S () >> { >> fwrite_unlocked (buffer, end - buffer, 1, stdout); >> } >> >> #if VERSION == 3 >> void __attribute__((noinline)) >> #else >> void >> #endif >> write (const char *x, size_t len) >> { >> if (__builtin_expect (buffer + sizeof (buffer) - end < len, 0)) >> { >> fwrite_unlocked (buffer, end - buffer, 1, stdout); >> end = buffer; >> } >> memcpy (end, x, len); >> end += len; >> } >> >> #if VERSION == 1 || VERSION == 3 >> template >> void >> write (const char (&x)[N]) >> { >> write (x, N - 1); >> } >> #elif VERSION == 2 >> template >> void __attribute__((noinline)) >> write (const char (&x)[N]) >> { >> write (x, N - 1); >> } >> #else >> void __attribute__((noinline)) >> write (const char *x) >> { >> write (x, strlen (x)); >> } >> #endif >> char buffer[4096]; >> char *end; >> }; >> >> int >> main () >> { >> S s; >> for (int i = 0; i < 100000000; ++i) >> { >> #if VERSION <= 4 >> s.write ("Hello!"); >> #elif VERSION == 5 >> fputs_unlocked ("Hello!", stdout); >> #elif VERSION == 6 >> fwrite_unlocked ("Hello!", 6, 1, stdout); >> #elif VERSION == 7 >> std::cout << "Hello!"; >> #else >> #error Please define VERSION >> #endif >> } >> return 0; >> }