Encoding Dynamic Invariance for Video Understanding