From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5093 invoked by alias); 18 Jan 2013 22:30:23 -0000 Received: (qmail 5068 invoked by uid 22791); 18 Jan 2013 22:30:20 -0000 X-SWARE-Spam-Status: No, hits=-4.7 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,KHOP_RCVD_TRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,SARE_SUB_IMPROVE,TW_CP X-Spam-Check-By: sourceware.org Received: from mail-la0-f46.google.com (HELO mail-la0-f46.google.com) (209.85.215.46) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 18 Jan 2013 22:29:44 +0000 Received: by mail-la0-f46.google.com with SMTP id fq12so1934275lab.19 for ; Fri, 18 Jan 2013 14:29:42 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.152.145.37 with SMTP id sr5mr10075091lab.33.1358548182793; Fri, 18 Jan 2013 14:29:42 -0800 (PST) Received: by 10.152.23.36 with HTTP; Fri, 18 Jan 2013 14:29:42 -0800 (PST) In-Reply-To: References: Date: Fri, 18 Jan 2013 22:30:00 -0000 Message-ID: Subject: Re: [Patch, libfortran] Improve performance of byte swapped IO From: Janne Blomqvist To: Richard Biener Cc: Andreas Schwab , GCC Patches , Fortran List Content-Type: text/plain; charset=UTF-8 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2013-01/txt/msg00988.txt.bz2 PING**2 On Mon, Jan 14, 2013 at 12:44 AM, Janne Blomqvist wrote: > PING**1.2 > > Yet another slightly updated patch attached. Compared to the previous > version, now with specializations for size 12 and 16 as well. For the > real(10) benchmark, with the previous v3 patch (please disregard the > absolute values in the post quoted below, there were wrong due to a > bug): > > Unformatted sequential write/read performance test > Record size Write MB/s Read MB/s > ========================================================== > 4 80.578833140738340 127.33074266188656 > 8 137.61682156650559 184.49033790407984 > 16 202.72871312800621 275.98801561061816 > 32 275.33538767460863 413.43956672052303 > 64 341.04488670485119 555.13744525826564 > 128 384.77917051919820 671.44655208024699 > 256 410.97208129045833 763.97660513918527 > 512 425.76619227779878 826.41086693364593 > 1024 430.77035999730009 840.30757120448550 > 2048 438.30318459339475 885.50033810296600 > 4096 455.79422809097599 919.78265920652086 > 8192 465.74499205886326 959.06963983370918 > 16384 472.48133493971142 991.11244162081744 > 32768 471.00024619567603 1015.7428144049615 > 65536 474.91235280949985 1021.2150519080892 > 131072 475.18664487440901 1006.3701982554830 > 262144 478.00435092846868 985.17141300594039 > 524288 476.72837201590363 991.74226579987987 > > With the new v4 patch: > > Unformatted sequential write/read performance test > Record size Write MB/s Read MB/s > ========================================================== > 4 87.353141847504133 145.09410391177835 > 8 166.95093628370549 223.60877830048437 > 16 272.20937208187746 364.91673986840277 > 32 415.26016354252715 599.41744252952310 > 64 592.97676703528009 900.53345964312450 > 128 748.27218547147686 1189.7131837787238 > 256 874.83098506714384 1561.3649529261234 > 512 935.69494481144284 1823.1760143164879 > 1024 983.51689491813215 1931.8773088107300 > 2048 1009.5491761651396 1971.6978586130062 > 4096 1115.5862027658552 2119.4151169997808 > 8192 1172.9400229568287 2184.1403983641089 > 16384 1222.6659284153168 2258.5490449229878 > 32768 1242.2417626697293 2251.8159046253918 > 65536 1227.9967555594396 2313.4106672387143 > 131072 1204.4295656544052 2129.1309150039478 > 262144 1135.7905614378458 2154.7146453789856 > 524288 1075.5769074402640 2170.5151501933169 > > > On Fri, Jan 11, 2013 at 10:41 PM, Janne Blomqvist > wrote: >> PING. >> >> Slightly updated patch attached, which further improves the generic >> size fallback that is used when the element size is not 2/4/8 bytes. >> Changing the us_perf benchmark to use real(10), with the v2 patch the >> performance is: >> >> Unformatted sequential write/read performance test >> Record size Write MB/s Read MB/s >> ========================================================== >> 4 59.028550429522085 86.019754350948787 >> 8 79.028327063130590 95.803502000733374 >> 16 99.980457395413296 138.68367462874946 >> 32 122.56886206338788 180.05609910155042 >> 64 152.00478266944486 212.69931319407567 >> 128 197.74137934940202 235.19728791956828 >> 256 155.36245780017779 244.60578379215929 >> 512 157.13385845966246 245.07467397691480 >> 1024 177.26553799130201 260.44908357795623 >> 2048 208.22852888945587 260.21587143113527 >> 4096 222.88410474980634 262.66162209490591 >> 8192 226.71167580652920 265.81191407123663 >> 16384 206.51818241747065 263.59395165591724 >> 32768 230.18707026455866 265.88990325026526 >> 65536 229.19783089391504 268.04485112932684 >> 131072 231.12215662044449 267.40543904427710 >> 262144 230.72012123598142 267.60086931504122 >> 524288 230.48959460456055 268.78750211303725 >> >> With the new v3 patch I get >> >> Unformatted sequential write/read performance test >> Record size Write MB/s Read MB/s >> ========================================================== >> 4 59.779061121239941 92.777125264010024 >> 8 92.727504266051341 126.64775563782673 >> 16 128.94793911163904 184.69194300482837 >> 32 169.78916283536847 267.06752001266767 >> 64 209.50296476919556 341.60515130910238 >> 128 236.36709738360679 416.73212655882151 >> 256 251.79029695383340 465.46804746749740 >> 512 259.62269939828633 500.87346060356265 >> 1024 265.08842337586458 508.95530627428275 >> 2048 268.71795530051884 532.12211365683640 >> 4096 280.86546884821030 546.88907054369884 >> 8192 286.96049684823578 569.60958187426183 >> 16384 292.04368984868103 608.11503416324865 >> 32768 292.96677387959392 629.80651297065833 >> 65536 291.69098580137114 624.27103478079641 >> 131072 292.75666234956418 605.99766136491496 >> 262144 291.35520038228975 611.59061455535834 >> 524288 292.15446100501691 623.76232623081580 >> >> >> On Sat, Jan 5, 2013 at 11:13 PM, Janne Blomqvist >> wrote: >>> On Sat, Jan 5, 2013 at 5:35 PM, Richard Biener >>> wrote: >>>> On Fri, Jan 4, 2013 at 11:35 PM, Andreas Schwab wrote: >>>>> Janne Blomqvist writes: >>>>> >>>>>> diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c >>>>>> index c8ecc3a..bf2250a 100644 >>>>>> --- a/libgfortran/io/file_pos.c >>>>>> +++ b/libgfortran/io/file_pos.c >>>>>> @@ -140,15 +140,21 @@ unformatted_backspace (st_parameter_filepos *fpp, gfc_unit *u) >>>>>> } >>>>>> else >>>>>> { >>>>>> + uint32_t u32; >>>>>> + uint64_t u64; >>>>>> switch (length) >>>>>> { >>>>>> case sizeof(GFC_INTEGER_4): >>>>>> - reverse_memcpy (&m4, p, sizeof (m4)); >>>>>> + memcpy (&u32, p, sizeof (u32)); >>>>>> + u32 = __builtin_bswap32 (u32); >>>>>> + m4 = *(GFC_INTEGER_4*)&u32; >>>>> >>>>> Isn't that an aliasing violation? >>>> >>>> It looks like one. Why not simply do >>>> >>>> m4 = (GFC_INTEGER_4) u32; >>>> >>>> ? I suppose GFC_INTEGER_4 is always the same size as uint32_t but signed? >>> >>> Yes, GFC_INTEGER_4 is a typedef for int32_t. As for why I didn't do >>> the above, C99 6.3.1.3(3) says that if the unsigned value is outside >>> the range of the signed variable, the result is >>> implementation-defined. Though I suppose the sensible >>> "implementation-defined behavior" in this case on a two's complement >>> target is to just do a bitwise copy. >>> >>> Anyway, to be really safe one could use memcpy instead; the compiler >>> optimizes small fixed size memcpy's just fine. Updated patch attached. >>> >>> >>> -- >>> Janne Blomqvist >> >> >> >> -- >> Janne Blomqvist > > > > -- > Janne Blomqvist -- Janne Blomqvist