PING**1.2 Yet another slightly updated patch attached. Compared to the previous version, now with specializations for size 12 and 16 as well. For the real(10) benchmark, with the previous v3 patch (please disregard the absolute values in the post quoted below, there were wrong due to a bug): Unformatted sequential write/read performance test Record size Write MB/s Read MB/s ========================================================== 4 80.578833140738340 127.33074266188656 8 137.61682156650559 184.49033790407984 16 202.72871312800621 275.98801561061816 32 275.33538767460863 413.43956672052303 64 341.04488670485119 555.13744525826564 128 384.77917051919820 671.44655208024699 256 410.97208129045833 763.97660513918527 512 425.76619227779878 826.41086693364593 1024 430.77035999730009 840.30757120448550 2048 438.30318459339475 885.50033810296600 4096 455.79422809097599 919.78265920652086 8192 465.74499205886326 959.06963983370918 16384 472.48133493971142 991.11244162081744 32768 471.00024619567603 1015.7428144049615 65536 474.91235280949985 1021.2150519080892 131072 475.18664487440901 1006.3701982554830 262144 478.00435092846868 985.17141300594039 524288 476.72837201590363 991.74226579987987 With the new v4 patch: Unformatted sequential write/read performance test Record size Write MB/s Read MB/s ========================================================== 4 87.353141847504133 145.09410391177835 8 166.95093628370549 223.60877830048437 16 272.20937208187746 364.91673986840277 32 415.26016354252715 599.41744252952310 64 592.97676703528009 900.53345964312450 128 748.27218547147686 1189.7131837787238 256 874.83098506714384 1561.3649529261234 512 935.69494481144284 1823.1760143164879 1024 983.51689491813215 1931.8773088107300 2048 1009.5491761651396 1971.6978586130062 4096 1115.5862027658552 2119.4151169997808 8192 1172.9400229568287 2184.1403983641089 16384 1222.6659284153168 2258.5490449229878 32768 1242.2417626697293 2251.8159046253918 65536 1227.9967555594396 2313.4106672387143 131072 1204.4295656544052 2129.1309150039478 262144 1135.7905614378458 2154.7146453789856 524288 1075.5769074402640 2170.5151501933169 On Fri, Jan 11, 2013 at 10:41 PM, Janne Blomqvist wrote: > PING. > > Slightly updated patch attached, which further improves the generic > size fallback that is used when the element size is not 2/4/8 bytes. > Changing the us_perf benchmark to use real(10), with the v2 patch the > performance is: > > Unformatted sequential write/read performance test > Record size Write MB/s Read MB/s > ========================================================== > 4 59.028550429522085 86.019754350948787 > 8 79.028327063130590 95.803502000733374 > 16 99.980457395413296 138.68367462874946 > 32 122.56886206338788 180.05609910155042 > 64 152.00478266944486 212.69931319407567 > 128 197.74137934940202 235.19728791956828 > 256 155.36245780017779 244.60578379215929 > 512 157.13385845966246 245.07467397691480 > 1024 177.26553799130201 260.44908357795623 > 2048 208.22852888945587 260.21587143113527 > 4096 222.88410474980634 262.66162209490591 > 8192 226.71167580652920 265.81191407123663 > 16384 206.51818241747065 263.59395165591724 > 32768 230.18707026455866 265.88990325026526 > 65536 229.19783089391504 268.04485112932684 > 131072 231.12215662044449 267.40543904427710 > 262144 230.72012123598142 267.60086931504122 > 524288 230.48959460456055 268.78750211303725 > > With the new v3 patch I get > > Unformatted sequential write/read performance test > Record size Write MB/s Read MB/s > ========================================================== > 4 59.779061121239941 92.777125264010024 > 8 92.727504266051341 126.64775563782673 > 16 128.94793911163904 184.69194300482837 > 32 169.78916283536847 267.06752001266767 > 64 209.50296476919556 341.60515130910238 > 128 236.36709738360679 416.73212655882151 > 256 251.79029695383340 465.46804746749740 > 512 259.62269939828633 500.87346060356265 > 1024 265.08842337586458 508.95530627428275 > 2048 268.71795530051884 532.12211365683640 > 4096 280.86546884821030 546.88907054369884 > 8192 286.96049684823578 569.60958187426183 > 16384 292.04368984868103 608.11503416324865 > 32768 292.96677387959392 629.80651297065833 > 65536 291.69098580137114 624.27103478079641 > 131072 292.75666234956418 605.99766136491496 > 262144 291.35520038228975 611.59061455535834 > 524288 292.15446100501691 623.76232623081580 > > > On Sat, Jan 5, 2013 at 11:13 PM, Janne Blomqvist > wrote: >> On Sat, Jan 5, 2013 at 5:35 PM, Richard Biener >> wrote: >>> On Fri, Jan 4, 2013 at 11:35 PM, Andreas Schwab wrote: >>>> Janne Blomqvist writes: >>>> >>>>> diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c >>>>> index c8ecc3a..bf2250a 100644 >>>>> --- a/libgfortran/io/file_pos.c >>>>> +++ b/libgfortran/io/file_pos.c >>>>> @@ -140,15 +140,21 @@ unformatted_backspace (st_parameter_filepos *fpp, gfc_unit *u) >>>>> } >>>>> else >>>>> { >>>>> + uint32_t u32; >>>>> + uint64_t u64; >>>>> switch (length) >>>>> { >>>>> case sizeof(GFC_INTEGER_4): >>>>> - reverse_memcpy (&m4, p, sizeof (m4)); >>>>> + memcpy (&u32, p, sizeof (u32)); >>>>> + u32 = __builtin_bswap32 (u32); >>>>> + m4 = *(GFC_INTEGER_4*)&u32; >>>> >>>> Isn't that an aliasing violation? >>> >>> It looks like one. Why not simply do >>> >>> m4 = (GFC_INTEGER_4) u32; >>> >>> ? I suppose GFC_INTEGER_4 is always the same size as uint32_t but signed? >> >> Yes, GFC_INTEGER_4 is a typedef for int32_t. As for why I didn't do >> the above, C99 6.3.1.3(3) says that if the unsigned value is outside >> the range of the signed variable, the result is >> implementation-defined. Though I suppose the sensible >> "implementation-defined behavior" in this case on a two's complement >> target is to just do a bitwise copy. >> >> Anyway, to be really safe one could use memcpy instead; the compiler >> optimizes small fixed size memcpy's just fine. Updated patch attached. >> >> >> -- >> Janne Blomqvist > > > > -- > Janne Blomqvist -- Janne Blomqvist