Machine learning is about enabling computers to improve their performance on a given task as they get more data. Can we express this intuition quantitatively using information-theoretic techniques? In this post, I will discuss a classic paper by David Haussler, Michael Kearns, and Robert Schapire that (to the best of my knowledge) took the first step in this direction. I will describe some of their results, recast in a more explicitly information-theoretic way.
We will stick to the setting of binary classification: we have a feature space (which is completely arbitrary) and a binary label space