Video-to-App
A generation flow where the user uploads a video — a Loom walk-through, a competitor demo, a Figma prototype recording — and the AI builder reconstructs it as a working app. VULK extracts frames, runs vision models, and synthesizes the codebase.
Video-to-App
Video-to-app is the workflow where the user provides a screen recording of an existing UI (a Loom of their workflow, a Figma prototype playthrough, a competitor walk-through) and the platform reverse-engineers it into source code. The system samples key frames, runs each through a vision-language model to extract structure (routes, components, copy, color tokens), reconciles the scenes into a state graph, and emits the project.
In VULK, video-to-app is one of three multimodal entry points (alongside screenshot-to-app and URL clone). The clip is uploaded, FFmpeg extracts ~12 keyframes, each frame is captioned by a vision model, and the resulting structured plan is fed into the regular generation pipeline. The output is a React or Next.js project that mirrors the recorded flow — every screen, every interaction, every transition — and is rendered in the live Firecracker preview.
Voice-to-App
A generation flow where the user speaks the app description out loud and the AI builder transcribes, plans, and ships the code. VULK pipes microphone audio through Whisper, then into the standard generation agent.
Screenshot-to-App
A generation mode where the user drops one or more screenshots of an existing UI and the AI rebuilds it as a working application. VULK pairs vision models with the brand engine to match colors, fonts, and layout.