public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: How to compile c++ code without strip off utf-8 BOM?
       [not found] <e074d31a0902142343l1f22f47cp4d2b14a72455525f@mail.gmail.com>
@ 2009-02-17  7:08 ` Dancefire
  2009-02-17 13:23   ` John (Eljay) Love-Jensen
  0 siblings, 1 reply; 3+ messages in thread
From: Dancefire @ 2009-02-17  7:08 UTC (permalink / raw)
  To: gcc-help

Any helps?

On Sun, Feb 15, 2009 at 6:43 PM, Dancefire <dancefire@gmail.com> wrote:
>
> Hi,
>
> I am working on a project which developed under Windows platform, the source code files are saved as UTF-8 with BOM since they contain some characters other than ASCII. Visual C++ will need the BOM to identify the file is UTF8, so the developer of the project don't want to remove them. Since the code is portable, I will develop it under Linux, however, the UTF8 BOM stopped me doing that. I always got the following error if I keep the UTF8 BOM:
>
> tao@tao-laptop:~/src/openclas_rev47/cpp/src/unit_test$ g++ -I../../include unit_test.cpp -o unit_test
> unit_test.cpp:1: error: stray '\357' in program
> unit_test.cpp:1: error: stray '\273' in program
> unit_test.cpp:1: error: stray '\277' in program
> In file included from unit_test.cpp:63:
> unit_test_utility.hpp:1: error: stray '\357' in program
> unit_test_utility.hpp:1: error: stray '\273' in program
> unit_test_utility.hpp:1: error: stray '\277' in program
> unit_test_utility.hpp:1: error: stray '#' in program
> In file included from unit_test_utility.hpp:9,
>                  from unit_test.cpp:63:
> ...
>
> Remove the BOM can make the code be able to compile by g++. However, if I remove the BOM and commit them, the developers under Windows will complain. If I don't, I cannot make it compile under Linux. If I do the strip off the BOM everytime after checkout, and append BOM everytime before commit to subversion, then it is quite annoy me.
>
> Is there any way that I can compile the utf-8 file without stripped off the utf-8 BOM?
>
> Thanks,
>
> Tao Wang
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: How to compile c++ code without strip off utf-8 BOM?
  2009-02-17  7:08 ` How to compile c++ code without strip off utf-8 BOM? Dancefire
@ 2009-02-17 13:23   ` John (Eljay) Love-Jensen
  2009-02-18  9:33     ` Dario Saccavino
  0 siblings, 1 reply; 3+ messages in thread
From: John (Eljay) Love-Jensen @ 2009-02-17 13:23 UTC (permalink / raw)
  To: Dancefire, gcc-help

Hi Tao Wang,

My test.cpp source is UTF-8 with BOM.

If I compile it like this...

g++ -x c++ <(xxd -g 1 -s 3 test.cpp | xxd -g 1 -s -3 -r) -o a.out

... that strips out the first three bytes at the beginning.  For test.cpp, this happens to be the BOM (ef bb bf) at the beginning.

You'd may want to create a little 'stripBOM' program that behaves like 'cat', but gobbles the BOM if present.

Or you could use awk, sed, perl, or your favorite-text-munging-tool-of-choice to perform the same conversion.  I just used xxd because it was quick, for illustrative purposes.  (There's probably a more suitable unix tool than xxd for this kind of cat-with-offset, but you'd want something that filters out BOM rather than always offsetting.)

HTH,
--Ejlay

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: How to compile c++ code without strip off utf-8 BOM?
  2009-02-17 13:23   ` John (Eljay) Love-Jensen
@ 2009-02-18  9:33     ` Dario Saccavino
  0 siblings, 0 replies; 3+ messages in thread
From: Dario Saccavino @ 2009-02-18  9:33 UTC (permalink / raw)
  To: gcc-help; +Cc: Dancefire, John (Eljay) Love-Jensen

> Hi Tao Wang,
>
> My test.cpp source is UTF-8 with BOM.
>
> If I compile it like this...
>
> g++ -x c++ <(xxd -g 1 -s 3 test.cpp | xxd -g 1 -s -3 -r) -o a.out
>
> ... that strips out the first three bytes at the beginning.  For test.cpp, this happens to be the BOM (ef bb bf) at the beginning.
>
> You'd may want to create a little 'stripBOM' program that behaves like 'cat', but gobbles the BOM if present.
>
> Or you could use awk, sed, perl, or your favorite-text-munging-tool-of-choice to perform the same conversion.  I just used xxd because it was quick, for illustrative purposes.  (There's probably a more suitable unix tool than xxd for this kind of cat-with-offset, but you'd want something that filters out BOM rather than always offsetting.)
>
> HTH,
> --Ejlay
>

Hi Eljay and Tao Wang,

I have experienced the same problem working in a multi-platform
environment with a shared repository.

In my case the source files have no BOM (they are stored in the server
using the Windows machines' native encoding), so my solution was to
add -finput-charset=WINDOWS-1252 to gcc's command line. Unfortunately,
it seems like iconv has no way to insert/remove the BOM, so Tao Wang
is out of luck.

Eljay's solution isn't always viable either, because if the source
file #includes a header with the BOM the compilation fails.

I think there are two possible ways out:
1) Automatically execute a conversion command (like uconv
--remove-signature) at checkouts/commits
2) Install a modified libiconv with an additional character set "UTF8-BOM"

Best regards

   Dario

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-02-18  9:33 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <e074d31a0902142343l1f22f47cp4d2b14a72455525f@mail.gmail.com>
2009-02-17  7:08 ` How to compile c++ code without strip off utf-8 BOM? Dancefire
2009-02-17 13:23   ` John (Eljay) Love-Jensen
2009-02-18  9:33     ` Dario Saccavino

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).