Bounds on Generalization Error of Variational Lower Bound Estimatesof Mutual Information and Applications of Mutual Information
P Aditya Sreekar Sreekar
Machine learning techniques have found widespread usage in critical systems in the recent past.This rise of ML was mainly fueled by Neural Networks, which enabled end-to-end systems to use.This is possible because of the feature learning and universal approximator properties of NNs. Recentresearch has extended the application of NNs to the estimation of mutual information (MI) from samples.They construct a critic using a NN and maximize a variation lower bound of MI. These methods havealleviated many problems faced by previous MI estimation methods like the inability to scale andincompatibility with mini-batch-based optimization. While these estimation methods are reliable whenthe true MI is low, they tend to produce high variance estimates when the true MI is high. We arguethat the high variance characteristics are due to the uncontrolled complexity of the critic’s hypothesisspace. In support of this argument, we use the data-drivenRademacher complexityof the hypothesisspace associated with the critic’s architecture to derive upper bounds on thegeneralization errorofvariational lower bound based estimates of MI. Using the derived bounds, we show that Rademachercomplexity of the critic’s hypothesis is a major contributor to the generalization error. Inspired by thiswe propose to negate the high variance characteristics of these estimators by constraining the critic’shypothesis space to aReproducing Kernel Hilbert Space(RKHS), which corresponds to a kernel learnedusingAutomated Spectral Kernel Learning(ASKL). Further, we propose two regularisation terms byexamining the upper bound on the Rademacher complexity of RKHS learned by ASKL. The weightsof these regularization terms can be varied for a bias-variance tradeoff. We empirically demonstratethe efficacy of this regularization in enforcing proper bias-variance tradeoff on four different variationallower bounds of MI, namely NWJ, MINE, JS and SMILE.We also propose a novel application of MI estimation using variational lower bounds and NNs inlearning disentangled representations of videos. The learned representations contain two sets of features,content and pose representations which disjointly represent the content and movement in the video. Thisis achieved by minimizing the mutual information between pose representation of different frames from the same video, thus in effect forcing the content representation to capture information common betweendifferent frames. We qualitatively and quantitatively compare our method against DRNET  on twosynthetic and one real-world dataset.
|Year of completion:||June 2022|
|Advisor :||Anoop M Namboodiri|