The human visual system is proof that it is possible to learn new categories with extremely few samples; humans do not need a million samples to learn to distinguish a poisonous mushroom from an edible one in the wild. Such ability, arguably, comes from having seen millions of other categories and transferring learnt representations to the new categories. This talk will present a formal connection of machine learning with thermodynamics to characterize the quality of learnt representations for transfer learning. We will discuss how information-theoretic functionals such as rate, distortion and classification loss lie on a convex, so-called, equilibrium surface. We prescribe dynamical processes to traverse this surface under constraints, e.g., an iso-classification process that modulates rate and distortion to keep the classification loss unchanged. We demonstrate how such processes allow complete control over the transfer from a source dataset to a target dataset and can guarantee the performance of the final model. This talk will discuss results from https://arxiv.org/abs/1909.02729 and https://arxiv.org/abs/2002.12406.