HarmonyDreamer: Rectifying and Harmonizing
Dynamic Objects with Backgrounds for 4D Video
Generation via Inversion Scores
Affiliation
Address
Recent advancements in score distillation have markedly enhanced the generation of both static and dynamic 3D tasks by effectively harnessing the power of 2D pretrained diffusion models for artistic applications. However, traditional distillation methods primarily concentrate on object-centric representations without adequately addressing the integration of objects with their backgrounds, which includes matching rendering styles, shadows, and motion. This paper presents a new framework that incorporates a background-enhanced distillation score along with an iterative editing-reconstruction process tailored for 4D video generation from basic dynamic 4D Gaussian splatting. Our experiments demonstrate that our method not only excels in producing high-fidelity 4D videos that harmoniously blend objects with varied backgrounds, but also facilitates the creation of multiview videos. Additionally, our approach outperforms existing baselines in both spatiotemporal and semantic consistency, demonstrating its effectives in 4D video generation.