Implicit Neural Representations (INRs) have recently demonstrated impressive performance for
video compression. However, since a separate INR must be overfit for each video, scaling to high-resolution videos while maintaining
encoding efficiency remains a significant challenge. Hypernetwork-based approaches predict INR weights (hyponetworks) for unseen videos
at high speeds, but with low quality, large compressed size, and prohibitive memory needs at higher resolutions. We address these fundamental
limitations through three key contributions: (1) an approach that decomposes the weight prediction task spatially and temporally, by breaking
short video segments into patch tubelets, to reduce the pretraining memory overhead by 20×;
(2) a residual-based storage scheme that captures only differences between consecutive segment representations, significantly reducing bitstream size; and (3) a
temporal coherence regularization framework that encourages changes in the weight space to be correlated with video content. Our proposed method,
TeCoNeRV, achieves substantial improvements of 2.47dB and 5.35dB PSNR over the baseline at 480p and 720p on UVG, with 36% lower bitrates and 1.5-3× faster encoding speeds. With our low memory usage, we are the first hypernetwork approach to demonstrate results at 480p, 720p and 1080p on UVG, HEVC and MCL-JCV.
the baseline at 480p and 720p on UVG, with 36% lower bitrates and 1.5-3× faster encoding speeds.
With our low memory usage, we are the first hypernetwork-based approach to demonstrate results at 480p,
720p and 1080p on UVG, HEVC, and MCL-JCV.