
#Programmable Cinematography: Building Ministudio
INFO_BLOCK:My idea of world building as a state machine with orchestration and programmable generation of videos, simulation of a world with variables 😂
Ministudio
Late last year I attended an AI film making event , it had creativity zooming all over the place artists, photographers and filmmakers chatting and talking about the future of film making with AI. I was having fun but I had the most benefit from the last section of the workshop in which an amazing lady give a talk on the technicality of film making , a lady by the name Mercy Mutisya(if you reading this thanks a lot ) who writes scripts for living. I was amazed by the depth, tricks and technicality of film making and how AI really fills lots of gaps but never takes the mantle in totality.
In essence film making or content creation is a step by step process with lots of fine tuning and double checking, she talked about the best models for video generation and how to generate cinematically accurate videos I loved the talk but I did not have any immediate application of the things she taught me just not yet .
This week I was looking around for video to use to explain a project I was working on, a learning management platform . I thought maybe I should look into generating a video instead of trying to be a content creator which I am not. I tried using Google’s video gen models like Veo and OpenAI’s sora, it was bad, really bad 😂. The videos are just out of sync, the character and background continuity is whacky and the model keeps on changing things clearly I was bad prompt engineer so I thought maybe I could be a good software engineer and generate videos with state persistence and continuity of props and character and background with code and I just might have succeeded.
So the problem with generative models is hallucination and inconsistency. If you ask an AI model to generate a 10 second video of character doing stuff it would do a really great job and generate an impressive work however if you then ask for another 10 second video of the same character a new place , the character’s face, clothes and even the skin tone sometimes change. I believe this an issue that can be solved with code and here is my take on it .
I created an opensource Python framework called Ministudio, its an orchestration layer that treats videos as production like k8 deployments(or a master orchestrator that keeps everything in sync and in continuity)
How it works: It has 3 pillars
1. Identity grounding :
Instead of just prompting a character we use visual tokens We anchor the character’s details like (hair, eyes, build) in every generation step. By providing a reference portrait we force the model to respect the identity across 60+ seconds of footage.
2. Temporal Context
Models forget the background the moment the camera moves, Ministudio uses a State machine that tracks the environment geometry. We also extract the last 3 frames of every shot and feed them back into the next generation. This creates a memory that eliminates visual jumps (a problem am currently having with current logic is audio jumps)
3. An orchestrated audio sync
Audio should not be an after taught, we synchronize the generated video with audio handling variables lengths and sequential merging
Example Generated videos from my first trials
This is the video I need to generate for my own needs which was the first was sample
It uses Studio Ghibli aesthetics and Makoto Shinkai-lighting(I have no idea about those fancy terms ) all controlled via 20 lines of Python code found at our GitHub repo linked below: The script needs some work and I had the characters keep on changing appearance and overall details.
Here is another video of a grandpa explaining Quantum Mechanics to a kid
The continuity of both characters is really terrible and audio becomes a female voiceover once 😂
The last one I generated that gave me some success!!!
The character continuity and background is really fine , needs more work on voice and tonality but overly the more state we can keep track of the better the generation as we can see .
This is the latest iterations of our story telling
The code used to generate above videos
1"""
2ContextBytes: Human & Machine Harmony (Brand Story) - FLAGSHIP 60s+ EDITION
3======================================================================
4A cinematic introduction to ContextBytes with Dynamic Duration & Narrative Flow.
5Story: Emma (Student) & David (Professional) find clarity via ContextKeeper.
6"""
7
8import asyncio
9from pathlib import Path
10
11from ministudio.orchestrator import VideoOrchestrator
12from ministudio.providers.vertex_ai import VertexAIProvider
13from ministudio.config import (
14 VideoConfig, SceneConfig, ShotConfig, ShotType,
15 Character, Environment, StyleDNA, Persona, DEFAULT_PERSONA,
16 Cinematography, Camera, Color
17)
18
19# ============================================================================
20# CINEMATOGRAPHY - Master Filmmaker Presets
21# ============================================================================
22
23PREMIUM_CINE = Cinematography(
24 camera_behaviors={
25 "chaos_pan": Camera(lens="24mm", aperture="f/4", movement_style="jittery handheld pan-through-clutter"),
26 "discovery_macro": Camera(lens="100mm", aperture="f/2.8", movement_style="focus pull from screen to face"),
27 "architecture_top": Camera(lens="35mm", aperture="f/8", movement_style="high-angle crane down"),
28 "hero_infinite": Camera(lens="50mm", aperture="f/1.8", movement_style="slow push-in to subjects")
29 },
30 shot_composition_rules={
31 "rule_of_thirds": True,
32 "leading_lines": "towards the Knowledge Orb",
33 "depth_layering": "foreground bokeh, midground subjects, background architecture"
34 }
35)
36
37# ============================================================================
38# CHARACTERS - Visual Anchor Links
39# ============================================================================
40
41EMMA = Character(
42 name="Emma",
43 identity={
44 "hair": "short chestnut brown bob, hand-drawn texture with soft bangs",
45 "eyes": "large inquisitive amber eyes with detailed catchlights",
46 "face": "soft round Shinkai-style face, expressive subtle smile",
47 # Absolute consistency lock
48 "skin_tone": "fair porcelain with slight pink blush on cheeks",
49 "build": "slender, wearing a high-quality cerulean blue wool sweater",
50 "aesthetic": "painterly Ghibli protagonist, cinematic digital painting"
51 },
52 visual_anchor_path="c:/Users/USER/Music/ministudio/assets/references/emma_portrait.png",
53 current_state={
54 "clothing": "cerulean blue winter sweater, messy desk environment"},
55 voice_id="en-US-Studio-O", # Warm, welcoming female narrator
56 voice_profile={"style": "narrative", "pitch": 0.5}
57)
58
59DAVID = Character(
60 name="David",
61 identity={
62 "hair": "neatly groomed short onyx black hair",
63 "eyes": "deep intelligent dark eyes, scholarly focus",
64 "face": "focused angular features, clean-shaven",
65 # Absolute consistency lock
66 "skin_tone": "warm bronze skin with detailed hand-drawn shadows",
67 "glasses": "minimalist silver-rimmed circular glasses",
68 "aesthetic": "refined professional Ghibli style"
69 },
70 visual_anchor_path="c:/Users/USER/Music/ministudio/assets/references/david_portrait.png",
71 current_state={
72 "clothing": "charcoal grey corporate shirt, forest green scarf"}
73)
74
75Keeper = Character(
76 name="The ContextKeeper",
77 identity={
78 "form": "a levitating orb of liquid golden light, tennis-ball size",
79 "glow": "radiates #D4AF37 golden pulses and floating motes",
80 "texture": "ethereal, translucent golden core"
81 }
82)
83
84# ============================================================================
85# ENVIRONMENTS - Chaos to Wisdom
86# ============================================================================
87
88CHAOTIC_DORM = Environment(
89 location="Emma's Biology Dorm",
90 identity={
91 "architecture": "cluttered bookshelves, messy desktop, stacks of biology PDFs"},
92 current_context={
93 "lighting": "dim indoor light, blue glare from multiple computer screens",
94 "atmosphere": "claustrophobic, overwhelming information overload",
95 "time_of_day": "late night study session"
96 },
97 reference_images=[
98 "c:/Users/USER/Music/ministudio/assets/references/data_abyss_bg.png"]
99)
100
101CORPORATE_MAZE = Environment(
102 location="Modern Tech Office Lab",
103 identity={
104 "architecture": "glass walls, whiteboards filled with complex architecture diagrams"},
105 current_context={
106 "lighting": "slick fluorescent lighting, high-contrast shadows",
107 "atmosphere": "dry, technical, professional overwhelm",
108 "time_of_day": "busy afternoon"
109 },
110 reference_images=[
111 # Anchor for David's office vibes
112 "c:/Users/USER/Music/ministudio/assets/references/shinkai_stratosphere_bg.png"]
113)
114
115GITHUB_GARDEN = Environment(
116 location="The Knowledge Garden Study",
117 identity={
118 "architecture": "arched mahogany bookshelves, high ceilings, spiral stairs"},
119 current_context={
120 "lighting": "warm afternoon sun with visible dust motes (Tyndall effect)",
121 "atmosphere": "magical, painterly, deep academic peace",
122 "time_of_day": "golden hour"
123 },
124 reference_images=[
125 "c:/Users/USER/Music/ministudio/assets/references/ghibli_atelier_bg.png"]
126)
127
128STYLE_DNA = StyleDNA(
129 traits={
130 "visual_style": "Studio Ghibli hand-painted backgrounds",
131 "lighting_style": "Makoto Shinkai vibrant lens flares and glowing edges",
132 "color_palette": "Deep teals (#008080) transitioning to Master Gold (#D4AF37)",
133 "brushwork": "Painterly, thick impasto textures on clouds",
134 "detail_level": "Ultra-high, hyper-focused foregrounds"
135 },
136 references=["Spirited Away", "Your Name"]
137)
138
139
140async def create_brand_video():
141 print("Starting FLAGSHIP Production: ContextBytes Brand Story (Dynamic Flow)...")
142
143 provider = VertexAIProvider()
144 orchestrator = VideoOrchestrator(provider)
145
146 scene = SceneConfig(
147 concept="From Chaos to Human Wisdom",
148 mood="Intellectual, Cinematic, Magical",
149 characters={"Emma": EMMA, "David": DAVID, "Keeper": Keeper},
150 shots=[
151 # 1. Emma's Struggle (Demonstrates long narration splitting)
152 ShotConfig(
153 shot_type=ShotType.WS,
154 environment=CHAOTIC_DORM,
155 action="Wide jittery pan across Emma's room. Thousands of digital windows overlap in the air—PDFs, YouTube playlists, and research articles. Emma rubs her tired eyes, looking defeated by the stacks of books and open browser tabs.",
156 narration=(
157 "In a world where information moves faster than we can think, we often find ourselves lost. "
158 "Emma is a brilliant student, but even she is drowning in a sea of millions of PDFs, endless playlists, "
159 "and a thousand open tabs that lead nowhere. She's looking for wisdom, but she only finds noise."
160 ),
161 # This will be ~15s, triggering recursive splitting (8s + 7s)
162 duration_seconds=None
163 ),
164
165 # 2. The Discovery
166 ShotConfig(
167 shot_type=ShotType.CU,
168 environment=CHAOTIC_DORM,
169 action="Close-up on Emma's laptop screen. She opens ContextBytes. A warm golden pulse radiates from the center. The Keeper orb emerges from the UI, its light cleaning the digital clutter into organized spheres.",
170 narration="Meet Emma. She didn't need more data; she needed a way to make sense of it. She found ContextBytes.",
171 duration_seconds=None,
172 continuity_required=True
173 ),
174
175 # 3. The AI Teacher
176 ShotConfig(
177 shot_type=ShotType.MS,
178 environment=GITHUB_GARDEN,
179 action="Shot in the Garden Atelier. The Keeper levitates, projecting a glowing teal 3D biology model. Emma watches, her face lighting up as she finally understands. The atmosphere is peaceful.",
180 narration="Our agent, the ContextKeeper, doesn't just give answers. It guides you, explains the 'why', and organizes your path to mastery.",
181 duration_seconds=None,
182 continuity_required=True
183 ),
184
185 # 4. David's Professional Struggle
186 ShotConfig(
187 shot_type=ShotType.WS,
188 environment=CORPORATE_MAZE,
189 action="A high-angle shot of David in a sleek, cold tech office. He's dwarfed by skyscrapers of technical documentation and architectural specs. He looks stressed, trying to find clarity in the noise.",
190 narration=(
191 "And then there’s David. A professional engineer lost in the giant tech machine, "
192 "drowning in documentation, architectural specs, and complex specs that seem to have no end. "
193 "In the corporate maze, context is the first thing that we lose."
194 ),
195 duration_seconds=None # Split likely (8s + 4s)
196 ),
197
198 # 5. David's Clarity
199 ShotConfig(
200 shot_type=ShotType.WS,
201 environment=CORPORATE_MAZE,
202 action="The Keeper orb flies through David's office. Behind it, a beautiful glowing golden Knowledge Graph appears, physically connecting documents like a magical glowing architecture map.",
203 narration="ContextBytes reveals the invisible threads between documents—transforming a mountain of text into a clear, magical map of how everything flows.",
204 duration_seconds=None,
205 continuity_required=True
206 ),
207
208 # 6. Final Harmony
209 ShotConfig(
210 shot_type=ShotType.WS,
211 environment=GITHUB_GARDEN,
212 action="Emma and David stand on a balcony overlooking the Cloud Stratosphere. They look confident and inspired. The Keeper orb flies toward the camera, merging into the final brand signature.",
213 narration=(
214 "From the student's desk to the corporate boardroom, the path to mastery is now clear. "
215 "Deep simplicity. Modern intelligence. This is your Context, mastered. Welcome to ContextBytes."
216 ),
217 duration_seconds=None,
218 continuity_required=True
219 )
220 ]
221 )
222
223 config = VideoConfig(
224 persona=DEFAULT_PERSONA,
225 style_dna=STYLE_DNA,
226 cinematography=PREMIUM_CINE,
227 output_dir="./contextbytes_production",
228 aspect_ratio="16:9",
229 negative_prompt="photorealistic, 3d render, CGI, grainy, distorted face, bad anatomy, low quality"
230 )
231
232 # Run the production
233 result = await orchestrator.generate_production(
234 scene=scene,
235 base_config=config,
236 output_filename="contextbytes_flagship_dynamic.mp4"
237 )
238
239 if result["success"]:
240 print(
241 f"SUCCESS! Dynamic Flagship video saved to: {result['local_path']}")
242 else:
243 print(f" FAILED: {result.get('error')}")
244
245if __name__ == "__main__":
246 asyncio.run(create_brand_video())
247The Road Ahead
We are now entering the age of precise control the tiny flickers in the background or the 0.5 lag in lip sync in narrations.
We will need deeper background masking and locking the environment only generating the character motions and interaction with objects or props
Ensuring the noise of the model is consistent across the whole scene and waveform orchestration using actual audio wavelengths to drive the duration and intensity
Ministudio is an opensource framework the code can be found at here
Check it out and read the markdown docs to contribute and add more models support if you can.
Intelligence Sync / COMMENTS
Retrieving cluster data...