The Cramer Distance as a Solution to Biased Wasserstein Gradients

In this paper, the authors describe three natural properties of probability divergences that reflect requirements from machine learning: sum invariance, scale sensitivity, and unbiased sample gradients. The Wasserstein metric possesses the first two properties but, unlike the Kullback-Leibler divergence, does not possess the third. They provide empirical evidence suggesting that this is a serious issue in practice.


Want to receive more content like this in your inbox?