Not being a programmer or game designer, I may be way off on what's possible. That won't stop me from offering ideas, in the hope of my learning.
Regarding algorithims/etc., an AI that learns, game design for combat, etc., what's wrong with this sort of idea:
Suppose we're starting with a simple, typical battle -- city siege. And we're making simple assumptions (troops essentially equal on both sides, no special abilities to start, etc.) and using 'average' capabilities (wooden walls, moderate unit strength/armor/hits/etc.).
Let's also make simple definitions -- each side's strength is the sum of its unit's 'hit points', the fight lasts until one side is eliminated (the loser), if more than half the winner's strength remains it's a significant victory (and a significant loss for the other side), etc. And let's assume the goal for the attacker is to have 60% of it's strength remaining.
If the attacker loses significantly, add 50% to attacker's starting strength for next time. If the attacker loses normally add 25%. If the attacker wins but has less than 60% strength remaining add 10%. If attacker has more than 60% remaining subtract 10%.
Start with defender having 100 strength. Make an assumption on what strength attacking force is needed to start then run 1000 attacks, with the AI 'learning' from each to modify starting attacker strength for the next attack. You might end up with a pretty good estimate for that scenario.
Toss out the first 500 attacks (assume it's still 'calibrating') then look at the last 500. Does the determined attack strength jump around a lot? If so there may be a problem with randomization of combat results, or some similar problem. Run an ANOVA (or whatever they're doing these days). Maybe the results are too consistent. If so, add more variability. etc. etc. etc.
Then repeat the above with defender strength of 200, 400, 800, etc., and 50, 12. This'll determine the effect of defender streng and allow adding weights to adjust accordingly.
Then add in more specific capabilities, and repeat. You may be able to make correlations, such as -- adding in 50% of the defender's strength as ranged equals doubling standard defender strength. This will help create algorithims for army composition. Similar comparisons can be made for other abilities, such as fire spells vs units resistant to fire, etc.
While the above is from the point of view of the attacker, it can be used by the AI on defense (by holding attacking strength unchanged and modifying defender strength).
Having an AI that is able to 'learn', and being able to run thousands of simulations very quickly/easily, for a number of different situations -- won't that allow some good guesses for constructing algorithims to make a decent AI?