On 04/28/2017 10:22 AM, Jeff Law wrote:
> On 04/27/2017 03:05 PM, Martin Sebor wrote:
>> On 04/26/2017 04:34 PM, Jakub Jelinek wrote:
>>>
>>> Also, can't there be a way to shortcut all this processing if the
>>> charsets are the same?  And is it a good idea if every pass that
>>> needs to do
>>> something with the exec charset chars caches its own results of the
>>> langhook?
>>
>> The biggest overhead is calling lang_hooks.to_target_charset
>> to initialize each character in the source set.  That could
>> be avoided if there were a way to determine in the middle-end
>> whether the input and target charsets are the same, but I don't
>> see one.  Alternatively, it could be optimized by converting
>> all the characters in one shot as a string, but without adding
>> a new target hook to do that I don't see how to do that either.
>> It might be a useful enhancement but given the scope it feels
>> like it should be done independently of this change.  But if
>> you know of a solution that escaped me please let me know.
>>
>> The overhead of the additional processing should be negligible
>> irrespective of whether the charsets are different or the same
>> (all it entails is indexing into a table).
> So the initialization could be done once per translation unit rather
> than once per function -- assuming the target character set doesn't
> change within a translation unit.
>
> That seems like it ought to be easy.

It is easy.  I was going to respond by saying "It already is done
that way" because I implemented it in the first patch (by checking
the mapping for '%'.  But now I see I accidentally removed the code
in the update while exploring ways to optimize it some more.  Sigh.
Let me put it back.

>> +   set corresponding to the string TARGSTR consisting of characters in
>> +   the execution character set.  */
>> +
>> +static const char*
>> +target_to_host (const char *targstr)
>> +{
>> +  /* The interesting subset of source and execution characters are
>> +     the same so no conversion is necessary.  */
>> +  if (target_to_host_charmap['\0'] == 1)
>> +    return targstr;
>> +
>> +  /* Convert the initial substring of TARGSTR to the corresponding
>> +     characters in the host set, appending "..." if TARGSTR is too
>> +     long to fit.  Using the static buffer assumes the function is
>> +     not called in between sequence points (which it isn't).  */
>> +  static char hostr[32];
>> +  for (char *ph = hostr; ; ++targstr)
>> +    {
>> +      *ph++ = target_to_host (*targstr);
>> +      if (!*targstr)
>> +    break;
>> +
>> +      if (ph - hostr == sizeof hostr - 4)
>> +    {
>> +      *ph = '\0';
>> +      strcat (ph, "...");
>> +      break;
>> +    }
>> +    }
>> +
>> +  return hostr;
> Ewww.  I guess the alternative would be something like:
>
> Expand the return value to include a indicator of whether or not the
> original string was returned (common case) or if a new string was
> returned and thus needs to be deallocated by the caller.
>
> That's probably pretty unpleasant given we don't have a central place
> where we call target_to_host, so the caller side would be much uglier.
>
> Are there any downstream impacts when the string is too long to covert
> other than not warning for things which were unconverted?

The function is only used when printing the text of the directive
in the warning.  It doesn't prevent the warnings, it just truncates
them.  I think the truncation beyond a certain length is a good
thing regardless of the conversion.  Without it, the warning might
end up printing thousands of lines of text, e.g., in

   sprintf (d, "thousands of lines of text...");

I don't like returning a pointer to a static buffer but given that
the scope of the function is limited to the pass it seems fairly
safe.  Another alternative would be to pass in a buffer and its
size.  That shouldn't complicate the caller too much.  The easiest,
cleanest, and safest solution by far is to return a std::string,
but I have the impression that would be against GCC convention of
avoiding the STL.

Attached is an updated patch with these two tweaks.

Martin