Character set of XML files

Hello,

According to the header, the Dreadlords XML files are UTF-8 encoded, however, in practise the game seems to treat them as ISO-8859-1 encoded and pays no attention at all to the encoding tag in the header. Any idea what is the desired encoding of the files is, is this always ISO-8859-1 or does this depend on the current ANSI encoding of the Windows system in use?

 

4,751 views 4 replies
Reply #1 Top

I suppose the general basic XML rules apply here as well... which is a natively valid file with a UTF driven encoding structure. But, indirectly the parsing methods (while into a gameplay session) must have a specific schema and validation routines to gather the listed datasets as they SHOULD.

I believe there's no hidden mystery in such a straight forward text file while the characters are all shown as they are used or meant to define a "code" via the normal ASCII system (which is in fact an extension of the UTF principles).

 

What gets more complicated is when the OS (and default language settings on any given personal PC) resets the UTF set to adapt to one's Win Interface... just think about Kanji_Arabic_Greek_Etc letters & alphabets and you'll soon realize that there's MUCH more to a simple character (as it is represented ON a monitor or even, string fields) than a glyph and its relatively complex imaging typeset.

 

Possibly why the UTF standard was invented in the first place, btw.

 

 

Reply #2 Top

Yes indeed, the game probably reads the encoding line, but what for?!

If I open say the ThechTree.xml on my german set Windows with my english Visual Studio version , a lot of "illegal" characters are thrown in, mainly "||" for linebreaks I guess. Quite annoying if you want to use a standard xml reader .net implementation, it generates a runtime error.

So I need to buffer all xml files and filter for the ocurrence  of "||" , instead of directly accessing/reading :|

 

Please devs, fix this finally.

My guess is, that not everybody at SD has UTF-8 based system available, hence the "compatibility" line, or  some of them doing xml file with some weird editor, lets say word :rolleyes: .

I think, the game used to require iso but they silently changed it to no/or utf-8 requirement

 

As a sidenote, TA does not seem to require that line at all to read data, so my guess is, one could even ignore the encoding line. 

 

File created with my GalCiv2Ide get read, with or without.

Reply #3 Top

For now I have changed the header of my XML files to ISO-8859-1, that is what the game seems to expect and makes them display correctly in my editor.

Reply #4 Top

I think, the game used to require iso but they silently changed it to no/or utf-8 requirement
End of quote

 

Sort of, i guess. But as you said the main issue with lines encoding is more about how 'external' programs need to reproduce the intended structure for it to be recognized by the game engine itself (eventually & after "corrections").

 

There's no specific norm which would be better or worst unless someone wants to make sure compatibility is verified both within gameplay & whatever editing (or readouts) is_or_has_been performed outside the native code.

I would agree though that some clear 'SD standard' should be available at these or any xml parsing levels.

Feeds & line breaks are indeed a pain when one goes between the multiple variations in format; Doc, Txt, Wri and whatever else everyone (as in - software makers!) could mix us all up with.