S-Expressions vs. XML

A pointless rant. ;)

I hadn't actually done any real work with XML until I was doing a mod (of a mod) for Civilization 4. However, I had some knowledge of Lisp and S-expressions, and right from the start I found myself maddened by the wanton inefficiency of XML's reinvented wheel. Diving right in, here's a piece of it:

 

<BonusInfo>

<Type>BONUS_IRON</Type>

<Description>TXT_KEY_BONUS_IRON</Description>

<Civilopedia>TXT_KEY_BONUS_IRON_PEDIA</Civilopedia>

<BonusClassType>BONUSCLASS_RUSH</BonusClassType>

<ArtDefineTag>ART_DEF_BONUS_IRON</ArtDefineTag>

<TechReveal>TECH_IRON_WORKING</TechReveal>

<TechCityTrade>TECH_MINING</TechCityTrade>

<TechObsolete>NONE</TechObsolete>

<YieldChanges>

<iYieldChange>0</iYieldChange>

<iYieldChange>1</iYieldChange>

<iYieldChange>0</iYieldChange>

</YieldChanges>

<iAITradeModifier>10</iAITradeModifier>

<iAIObjective>0</iAIObjective>

<iHealth>0</iHealth>

<iHappiness>0</iHappiness>

<iPlacementOrder>0</iPlacementOrder>

<iConstAppearance>162</iConstAppearance>

<iMinAreaSize>3</iMinAreaSize>

<iMinLatitude>0</iMinLatitude>

<iMaxLatitude>80</iMaxLatitude>

<Rands>

<iRandApp1>10</iRandApp1>

<iRandApp2>10</iRandApp2>

<iRandApp3>0</iRandApp3>

<iRandApp4>0</iRandApp4>

</Rands>

<iPlayer>100</iPlayer>

<iTilesPer>0</iTilesPer>

<iMinLandPercent>0</iMinLandPercent>

<iUnique>7</iUnique>

<iGroupRange>0</iGroupRange>

<iGroupRand>0</iGroupRand>

<bArea>0</bArea>

<bHills>1</bHills>

<bFlatlands>1</bFlatlands>

<bNoRiverSide>0</bNoRiverSide>

<bNormalize>0</bNormalize>

<TerrainBooleans>

<TerrainBoolean>

<TerrainType>TERRAIN_DESERT</TerrainType>

<bTerrain>1</bTerrain>

</TerrainBoolean>

<TerrainBoolean>

<TerrainType>TERRAIN_GRASS</TerrainType>

<bTerrain>1</bTerrain>

</TerrainBoolean>

<TerrainBoolean>

<TerrainType>TERRAIN_MARSH</TerrainType>

<bTerrain>1</bTerrain>

</TerrainBoolean>

<TerrainBoolean>

<TerrainType>TERRAIN_PLAINS</TerrainType>

<bTerrain>1</bTerrain>

</TerrainBoolean>

<TerrainBoolean>

<TerrainType>TERRAIN_TUNDRA</TerrainType>

<bTerrain>1</bTerrain>

</TerrainBoolean>

<TerrainBoolean>

<TerrainType>TERRAIN_SNOW</TerrainType>

<bTerrain>1</bTerrain>

</TerrainBoolean>

</TerrainBooleans>

<FeatureBooleans>

<FeatureBoolean>

<FeatureType>FEATURE_FLOOD_PLAINS</FeatureType>

<bFeature>1</bFeature>

</FeatureBoolean>

<FeatureBoolean>

<FeatureType>FEATURE_FOREST</FeatureType>

<bFeature>1</bFeature>

</FeatureBoolean>

<FeatureBoolean>

<FeatureType>FEATURE_JUNGLE</FeatureType>

<bFeature>1</bFeature>

</FeatureBoolean>

<FeatureBoolean>

<FeatureType>FEATURE_SWAMP</FeatureType>

<bFeature>1</bFeature>

</FeatureBoolean>

</FeatureBooleans>

<FeatureTerrainBooleans>

<FeatureTerrainBoolean>

<TerrainType>TERRAIN_GRASS</TerrainType>

<bFeatureTerrain>1</bFeatureTerrain>

</FeatureTerrainBoolean>

<FeatureTerrainBoolean>

<TerrainType>TERRAIN_PLAINS</TerrainType>

<bFeatureTerrain>1</bFeatureTerrain>

</FeatureTerrainBoolean>

<FeatureTerrainBoolean>

<TerrainType>TERRAIN_MARSH</TerrainType>

<bFeatureTerrain>1</bFeatureTerrain>

</FeatureTerrainBoolean>

<FeatureTerrainBoolean>

<TerrainType>TERRAIN_TUNDRA</TerrainType>

<bFeatureTerrain>1</bFeatureTerrain>

</FeatureTerrainBoolean>

</FeatureTerrainBooleans>

</BonusInfo>

 

Now, for some reason they didn't use any attributes; this makes it all the more painful to look at, but is convenient for this example because you'd have to convert them to elements anyway before doing what I'm about to do. Here is the exact same data structure (including bits that only need to exist because it's in XML, like the individual iYieldChange elements) represented as an S-expression, with the XML indentation scheme maintained just to keep it as familiar as possible (e.g., closing parentheses would normally not have their own lines):

 

(BonusInfo

(Type BONUS_IRON)

(Description TXT_KEY_BONUS_IRON)

(Civilopedia TXT_KEY_BONUS_IRON_PEDIA)

(BonusClassType BONUSCLASS_RUSH)

(ArtDefineTag ART_DEF_BONUS_IRON)

(TechReveal TECH_IRON_WORKING)

(TechCityTrade TECH_MINING)

(TechObsolete NONE)

(YieldChanges

(iYieldChange 0)

(iYieldChange 1)

(iYieldChange 0)

)

(iAITradeModifier 10)

(iAIObjective 0)

(iHealth 0)

(iHappiness 0)

(iPlacementOrder 0)

(iConstAppearance 162)

(iMinAreaSize 3)

(iMinLatitude 0)

(iMaxLatitude 80)

(Rands

(iRandApp1 10)

(iRandApp2 10)

(iRandApp3 0)

(iRandApp4 0)

)

(iPlayer 100)

(iTilesPer 0)

(iMinLandPercent 0)

(iUnique 7)

(iGroupRange 0)

(iGroupRand 0)

(bArea 0)

(bFlatlands 1)

(bHills 1)

(bNoRiverSide 0)

(bNormalize 0)

(TerrainBooleans

(TerrainBoolean

(TerrainType TERRAIN_DESERT)

(bTerrain 1)

)

(TerrainBoolean

(TerrainType TERRAIN_GRASS)

(bTerrain 1)

)

(TerrainBoolean

(TerrainType TERRAIN_MARSH)

(bTerrain 1)

)

(TerrainBoolean

(TerrainType TERRAIN_PLAINS)

(bTerrain 1)

)

(TerrainBoolean

(TerrainType TERRAIN_TUNDRA)

(bTerrain 1)

)

(TerrainBoolean

(TerrainType TERRAIN_SNOW)

(bTerrain 1)

)

)

(FeatureBooleans

(FeatureBoolean

(FeatureType FEATURE_FLOOD_PLAINS)

(bFeature 1)

)

(FeatureBoolean

(FeatureType FEATURE_FOREST)

(bFeature 1)

)

(FeatureBoolean

(FeatureType FEATURE_JUNGLE)

(bFeature 1)

)

(FeatureBoolean

(FeatureType FEATURE_SWAMP)

(bFeature 1)

)

)

(FeatureTerrainBooleans

(FeatureTerrainBoolean

(TerrainType TERRAIN_GRASS)

(bFeatureTerrain 1)

)

(FeatureTerrainBoolean

(TerrainType TERRAIN_PLAINS)

(bFeatureTerrain 1)

)

(FeatureTerrainBoolean

(TerrainType TERRAIN_MARSH)

(bFeatureTerrain 1)

)

(FeatureTerrainBoolean

(TerrainType TERRAIN_TUNDRA)

(bFeatureTerrain 1)

)

)

)

 

What's the difference? Just for starters, a 40% smaller file. And though this little grumble-fit has been in the back of my mind since I first heard Frogboy say the word "XML," his comment about the huge sizes of some of the tiles his artists were creating is what finally made me post. Beyond this, you could replace the YieldChanges set with (YieldChanges 0 1 0), and perform various similar optimizations (simplifying the terrain type trees comes to mind), with trivial code changes and no impact on readability.

Fun part is, it's also trivial to convert one to the other. This is a change which could be made even this late in development. I don't expect it to be, and I can live with XML (I'll have to!), but... well, like I said, grumble-fit.

Cheers! :D

(Forum feature request: Lucida Console as a font option. Courier New is hideous, and the other monospace fonts, Andale and Terminal, appear to be broken.)

6,574 views 6 replies
Reply #1 Top

Just for starters, a 40% smaller file.
End of quote

Is the size of the data file really important? You could also reduce the size of the XML file with short name and no indentation. What is really important is the use of those data in the game, and generally, the data in the XML files are loaded at the start of the game and converted in a usable form to be accessed fastly when needed.

Reply #2 Top

...and you could reduce the size of the S-expression even more with short names and no indentation. But long element names and clean indentation are a good thing for readability, and you're right that this isn't the most important concern.

From the code-efficiency side, it's less about raw size than it is about the efficiency of both the data structure and the parser. You don't have to remember the text name of a tag while scanning the file, much less the names of an entire tree of tags, which can become significant. S-expressions can contain un-named elements, which is great when a subset of the tree has a fixed structure; for instance, as I commented above:

<YieldChanges>

<iYieldChange>0</iYieldChange>

<iYieldChange>1</iYieldChange>

<iYieldChange>0</iYieldChange>

</YieldChanges>

...can be converted to...

(YieldChanges 0 1 0)

 

Furthermore — and in all honesty this is the most important thing to Me, The Modder — S-expressions require less typing and are easier to read. Long element names don't become a double-edged sword (almost literally) the way they do in XML.

Reply #3 Top

I think the problem is more how they decided to represent things in XML rather than XML itself (like you said, not using attributes for example).

Either way, I think you are missing three points:

  • I suppose most modern languages come with XML parsers and not with S-expression parsers
  • I suppose there are more XML editors out in the world than S-expression editors, something that also eases the pain of using it.
  • Last, you can validate XML with XSD very easily (which also makes the editor very useful, as a good editor will probably have intellisense and things like that based on the XSD), and I have no idea if you can do the same with an S-expression...
Reply #4 Top

Point 1: pretty trivial compared to the rest of the engine, especially since most of the existing code can be adapted.

Point 2: many editors support Lisp. Even if they don't, the only critical function is parenthesis matching.

Point 3: S-expression validation pretty much IS parenthesis matching. ;)

Yes, it would take a little more work... a little more time spent building your own tools. But that seems to be par for the course in Elemental development. When it comes down to it, the design paradigm seems to be "modding tools are job 1," and the game designers themselves are just the first modders to get their hands on those tools!

Anyway, again, I'm not holding my breath for this to happen. I'm just saying, hey Frog, take a look... maybe next time? If it really excites him, maybe this time. Or maybe never, because he's got other concerns. But the point is to make sure the idea gets exposure. If there's one company I could imagine putting aside "because it's an industry standard" to try something that might actually work better in several ways, these are the guys; might as well make sure they're aware of the option!

Reply #5 Top

I don't think this is a battle you're going to win.  Even if the game wasn't a month from being complete, XML is a standard format for laying out data and is known far and wide.  I had never heard of S-expression before this thread.  This line is what bothers me the most in your posts "the most important thing to Me, The Modder — S-expressions require less typing and are easier to read."  One of the main reason for having all those elements in XML is clarity.  Both to whatever's digesting the data and to the person making changes.  Stardock could have easily used something like you're suggesting or something like a property file.  But it should be easy for anyone wanting to mod the game to understand what they're looking at.

If it really bothers you that you have to use XML, I encourage you to make a S-expression to XML converter.

Later,
LAR 

Reply #6 Top

Quoting tejondour, reply 4
Point 1: pretty trivial compared to the rest of the engine, especially since most of the existing code can be adapted.

Point 2: many editors support Lisp. Even if they don't, the only critical function is parenthesis matching.

Point 3: S-expression validation pretty much IS parenthesis matching.

Yes, it would take a little more work... a little more time spent building your own tools. But that seems to be par for the course in Elemental development. When it comes down to it, the design paradigm seems to be "modding tools are job 1," and the game designers themselves are just the first modders to get their hands on those tools!

Anyway, again, I'm not holding my breath for this to happen. I'm just saying, hey Frog, take a look... maybe next time? If it really excites him, maybe this time. Or maybe never, because he's got other concerns. But the point is to make sure the idea gets exposure. If there's one company I could imagine putting aside "because it's an industry standard" to try something that might actually work better in several ways, these are the guys; might as well make sure they're aware of the option!
End of tejondour's quote

Point 1: it will be trivial, but it's more code they have to write, test and maintain.

Point 2: there are more editors that support XML than LISP out there, and with intellisense/auto-complete, the advantage of S-expressions (writing less) disappears.

Point 3: wrong. Validation is not only parenthesis matching, is writing elements that don't exist, nesting elements that can't be nested,... Most XML libraries can also validate against a XSD automatically while using S-expressions you would have to write your own thing again.

Really, using a good XML editor, writing a tag should be 1-3 keystrokes at most (1-2 letters and a tab or the auto-complete key), so I don't see many problems for modders...