Maruscsák — AI-enhanced AR

description:

This project demonstrates a novel pipeline where large-scale, computation-heavy video models extend the expressive potential of augmented reality.

We leveraged the Jean-Zay supercomputer to run the LTX–Video Distilled model via ComfyUI, generating high-quality video close to real-time speeds (end-to-end), which we streamed back to our local setup featuring an RTX 4090 desktop driving Canon MREAL X1 AR glasses. We implemented the AR visualisations in Unity and created a simple OSC-based touch interface for a smartphone. Users triggered the video generation by looking at a printed poster and tapping the smartphone’s screen. We used Flask to establish the communication between the local frontend and remote backend systems.

Users trigger video generation by looking at a poster and tapping a button on a smartphone. We developed the frontend in <Unity (with OSC-based phone input and HMD display), implemented the backend in ComfyUI, and used Flask to coordinate communication between the remote and local systems.

contributions:

My role in this project included implementing the frontend system, with a focus on AR visualization and user interface. I was responsible for porting the Flask communication protocol on the client side from Python to C# and integrating it into the Unity environment. I also contributed to the prompt engineering team to fine-tune the LTX video generation. We focused on creating prompts to ensure the outputs remained coherent and meaningful for the AR experience. Additionally, I implemented the Unity XScaling third-party plugin to upscale the decoded video frame by frame and dynamically swap the frames in the background—enabling seamless high-resolution playback for the user.