Quoting ddd888, reply 40the most important thing is another
you guys dont consider that multiplayer is the BEST balance workshop
I absolutely agree...
...assuming the players are of high skill level.
While, granted, they are (likely) going to be better than the current AI and I agree that MP certainly gives a lot more data for consideration (strategies, abilities/skills/spells used in ways that might not have been imagined when they were created, etc), humans aren't immune from making stupid mistakes either - especially if they are newbies to the game, play a few campaign missions, then jump on to MP to try their luck.
Stardock once said (can't find the quote, it was forever ago) that they planned to track multiplayer statistics and feed it into creating a better AI - i.e. if strategy x won more often than not, teach the AI to use strategy x better!
So even though there will obviously be exceptions (bad players or even good players making mistakes), if you look at the average results of all multiplayer games, you should get some useful info for both game balance and a better AI. Even if players using an "overpowered" unit/strategy/etc can make mistakes and still lose to a weaker unit/strategy/etc, the people using underpowered strategies will make mistakes too, and overall the overpowered choice should lead to more wins. The only thing that might throw off your results is too small of a sample size - i.e. not enough people playing multiplayer.
For example: you're an amazingly skillful player who happens to love archers even though they suck, and you consistently beat lots of noobs by using archers. If 10 people total are playing the game, sure you could seriously throw off Stardock's perception of game balance by making archers look good; if there's 100 people, your impact will be small but maybe still noticeable; if there's 1000 people, your impact will be trivial, the averages will show that lord's hammers are better and win far more games. If even 1% of Elemental's buyers play multiplayer, we're pretty close to that 1000 mark.
In other words, if your sample size is large enough, player skill should be pretty irrelevant. Even if bad players sometimes use lord's hammer, they also use bows sometimes too, as do good players - the average should still show that lord's hammer beats bows most of the time.