Note: I'm not sure if at the beginning of the game, one of the agents [of AlphaStar] is chosen according to the Nash probabilities, or if at each timestep an action is chosen according to the Nash probabilities.
It's the former. During the video demonstration, the pro player remarked how after losing game 1, in game 2 he went for a strategy that would counter the strategy AlphaStar used in game 1, only to find AlphaStar had used a completely different strategy. The AlphaStar representatives responded saying there's actually 5 AlphaStar agents that form the Nash Equilibrium, and he played one of them during game 1, and then played a different one during game 2.
And in fact, they didn't choose the agents by the Nash probabilities. Rather, they did a "best of 5" tournament, and they just had each of the 5 agents play one game. The human player did not know this, and thus could not on the 5th game know ahead of time by process of elimination that there was only 1 remaining agent possible, and thus know what strategy to use to counter it.