From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-335745-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 5093 invoked by alias); 18 Jan 2013 22:30:23 -0000
Received: (qmail 5068 invoked by uid 22791); 18 Jan 2013 22:30:20 -0000
X-SWARE-Spam-Status: No, hits=-4.7 required=5.0	tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,KHOP_RCVD_TRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,SARE_SUB_IMPROVE,TW_CP
X-Spam-Check-By: sourceware.org
Received: from mail-la0-f46.google.com (HELO mail-la0-f46.google.com) (209.85.215.46)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 18 Jan 2013 22:29:44 +0000
Received: by mail-la0-f46.google.com with SMTP id fq12so1934275lab.19        for <multiple recipients>; Fri, 18 Jan 2013 14:29:42 -0800 (PST)
MIME-Version: 1.0
X-Received: by 10.152.145.37 with SMTP id sr5mr10075091lab.33.1358548182793; Fri, 18 Jan 2013 14:29:42 -0800 (PST)
Received: by 10.152.23.36 with HTTP; Fri, 18 Jan 2013 14:29:42 -0800 (PST)
In-Reply-To: <CAO9iq9FLjgPtW4_e0CEdU48r-baQ7Uvx6P8OhWMcWiAmqtcZyw@mail.gmail.com>
References: <CAO9iq9HB5us6X3faKPt=ZaAcBz_KeVQBTy4ZDiZ6XfeTOVwohA@mail.gmail.com>	<m2a9somv9s.fsf@igel.home>	<CAFiYyc09eDLPsKUPP36mxh16Z+-umP3GJTpjxGTnXgFhongqig@mail.gmail.com>	<CAO9iq9E3m5ttXGkbJhn174apF-LUv-sa-zRGEwpFOQrwbDhKZg@mail.gmail.com>	<CAO9iq9GvGxrjzyByskWoua5JLJbJrcPeW9tHpeL3wWKs91cNdg@mail.gmail.com>	<CAO9iq9FLjgPtW4_e0CEdU48r-baQ7Uvx6P8OhWMcWiAmqtcZyw@mail.gmail.com>
Date: Fri, 18 Jan 2013 22:30:00 -0000
Message-ID: <CAO9iq9EEo3RdyxWacRfJTL5BaNtX3wgS1wC=O10QHs1O09W8KQ@mail.gmail.com>
Subject: Re: [Patch, libfortran] Improve performance of byte swapped IO
From: Janne Blomqvist <blomqvist.janne@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>, GCC Patches <gcc-patches@gcc.gnu.org>, 	Fortran List <fortran@gcc.gnu.org>
Content-Type: text/plain; charset=UTF-8
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2013-01/txt/msg00988.txt.bz2

PING**2

On Mon, Jan 14, 2013 at 12:44 AM, Janne Blomqvist
<blomqvist.janne@gmail.com> wrote:
> PING**1.2
>
> Yet another slightly updated patch attached. Compared to the previous
> version, now with specializations for size 12 and 16 as well. For the
> real(10) benchmark, with the previous v3 patch (please disregard the
> absolute values in the post quoted below, there were wrong due to a
> bug):
>
>   Unformatted sequential write/read performance test
>  Record size           Write MB/s                 Read MB/s
>  ==========================================================
>            4   80.578833140738340        127.33074266188656
>            8   137.61682156650559        184.49033790407984
>           16   202.72871312800621        275.98801561061816
>           32   275.33538767460863        413.43956672052303
>           64   341.04488670485119        555.13744525826564
>          128   384.77917051919820        671.44655208024699
>          256   410.97208129045833        763.97660513918527
>          512   425.76619227779878        826.41086693364593
>         1024   430.77035999730009        840.30757120448550
>         2048   438.30318459339475        885.50033810296600
>         4096   455.79422809097599        919.78265920652086
>         8192   465.74499205886326        959.06963983370918
>        16384   472.48133493971142        991.11244162081744
>        32768   471.00024619567603        1015.7428144049615
>        65536   474.91235280949985        1021.2150519080892
>       131072   475.18664487440901        1006.3701982554830
>       262144   478.00435092846868        985.17141300594039
>       524288   476.72837201590363        991.74226579987987
>
> With the new v4 patch:
>
>  Unformatted sequential write/read performance test
>  Record size           Write MB/s                 Read MB/s
>  ==========================================================
>            4   87.353141847504133        145.09410391177835
>            8   166.95093628370549        223.60877830048437
>           16   272.20937208187746        364.91673986840277
>           32   415.26016354252715        599.41744252952310
>           64   592.97676703528009        900.53345964312450
>          128   748.27218547147686        1189.7131837787238
>          256   874.83098506714384        1561.3649529261234
>          512   935.69494481144284        1823.1760143164879
>         1024   983.51689491813215        1931.8773088107300
>         2048   1009.5491761651396        1971.6978586130062
>         4096   1115.5862027658552        2119.4151169997808
>         8192   1172.9400229568287        2184.1403983641089
>        16384   1222.6659284153168        2258.5490449229878
>        32768   1242.2417626697293        2251.8159046253918
>        65536   1227.9967555594396        2313.4106672387143
>       131072   1204.4295656544052        2129.1309150039478
>       262144   1135.7905614378458        2154.7146453789856
>       524288   1075.5769074402640        2170.5151501933169
>
>
> On Fri, Jan 11, 2013 at 10:41 PM, Janne Blomqvist
> <blomqvist.janne@gmail.com> wrote:
>> PING.
>>
>> Slightly updated patch attached, which further improves the generic
>> size fallback that is used when the element size is not 2/4/8 bytes.
>> Changing the us_perf benchmark to use real(10), with the v2 patch the
>> performance is:
>>
>>  Unformatted sequential write/read performance test
>>  Record size           Write MB/s                 Read MB/s
>>  ==========================================================
>>            4   59.028550429522085        86.019754350948787
>>            8   79.028327063130590        95.803502000733374
>>           16   99.980457395413296        138.68367462874946
>>           32   122.56886206338788        180.05609910155042
>>           64   152.00478266944486        212.69931319407567
>>          128   197.74137934940202        235.19728791956828
>>          256   155.36245780017779        244.60578379215929
>>          512   157.13385845966246        245.07467397691480
>>         1024   177.26553799130201        260.44908357795623
>>         2048   208.22852888945587        260.21587143113527
>>         4096   222.88410474980634        262.66162209490591
>>         8192   226.71167580652920        265.81191407123663
>>        16384   206.51818241747065        263.59395165591724
>>        32768   230.18707026455866        265.88990325026526
>>        65536   229.19783089391504        268.04485112932684
>>       131072   231.12215662044449        267.40543904427710
>>       262144   230.72012123598142        267.60086931504122
>>       524288   230.48959460456055        268.78750211303725
>>
>> With the new v3 patch I get
>>
>>  Unformatted sequential write/read performance test
>>  Record size           Write MB/s                 Read MB/s
>>  ==========================================================
>>            4   59.779061121239941        92.777125264010024
>>            8   92.727504266051341        126.64775563782673
>>           16   128.94793911163904        184.69194300482837
>>           32   169.78916283536847        267.06752001266767
>>           64   209.50296476919556        341.60515130910238
>>          128   236.36709738360679        416.73212655882151
>>          256   251.79029695383340        465.46804746749740
>>          512   259.62269939828633        500.87346060356265
>>         1024   265.08842337586458        508.95530627428275
>>         2048   268.71795530051884        532.12211365683640
>>         4096   280.86546884821030        546.88907054369884
>>         8192   286.96049684823578        569.60958187426183
>>        16384   292.04368984868103        608.11503416324865
>>        32768   292.96677387959392        629.80651297065833
>>        65536   291.69098580137114        624.27103478079641
>>       131072   292.75666234956418        605.99766136491496
>>       262144   291.35520038228975        611.59061455535834
>>       524288   292.15446100501691        623.76232623081580
>>
>>
>> On Sat, Jan 5, 2013 at 11:13 PM, Janne Blomqvist
>> <blomqvist.janne@gmail.com> wrote:
>>> On Sat, Jan 5, 2013 at 5:35 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Fri, Jan 4, 2013 at 11:35 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
>>>>> Janne Blomqvist <blomqvist.janne@gmail.com> writes:
>>>>>
>>>>>> diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c
>>>>>> index c8ecc3a..bf2250a 100644
>>>>>> --- a/libgfortran/io/file_pos.c
>>>>>> +++ b/libgfortran/io/file_pos.c
>>>>>> @@ -140,15 +140,21 @@ unformatted_backspace (st_parameter_filepos *fpp, gfc_unit *u)
>>>>>>       }
>>>>>>        else
>>>>>>       {
>>>>>> +       uint32_t u32;
>>>>>> +       uint64_t u64;
>>>>>>         switch (length)
>>>>>>           {
>>>>>>           case sizeof(GFC_INTEGER_4):
>>>>>> -           reverse_memcpy (&m4, p, sizeof (m4));
>>>>>> +           memcpy (&u32, p, sizeof (u32));
>>>>>> +           u32 = __builtin_bswap32 (u32);
>>>>>> +           m4 = *(GFC_INTEGER_4*)&u32;
>>>>>
>>>>> Isn't that an aliasing violation?
>>>>
>>>> It looks like one.  Why not simply do
>>>>
>>>>    m4 = (GFC_INTEGER_4) u32;
>>>>
>>>> ?  I suppose GFC_INTEGER_4 is always the same size as uint32_t but signed?
>>>
>>> Yes, GFC_INTEGER_4 is a typedef for int32_t. As for why I didn't do
>>> the above, C99 6.3.1.3(3) says that if the unsigned value is outside
>>> the range of the signed variable, the result is
>>> implementation-defined. Though I suppose the sensible
>>> "implementation-defined behavior" in this case on a two's complement
>>> target is to just do a bitwise copy.
>>>
>>> Anyway, to be really safe one could use memcpy instead; the compiler
>>> optimizes small fixed size memcpy's just fine. Updated patch attached.
>>>
>>>
>>> --
>>> Janne Blomqvist
>>
>>
>>
>> --
>> Janne Blomqvist
>
>
>
> --
> Janne Blomqvist



-- 
Janne Blomqvist