
Classic architectural rendering is tedious: assigning materials, setting up lights, waiting in the render queue — hours for a single image. Yet you already have a 3D model on hand. The method architects and interior designers have embraced in recent years is this: feed the model's viewport image (or a clay render) to AI and get a photorealistic render in minutes while preserving the geometry.
The class of models behind this — Nano Banana 2 and similar Gemini image models — understands architectural geometry surprisingly well; some of the industry's leading AI rendering tools run on the very same family under the hood. Oxava AI connects you to this model directly: you upload your viewport as a reference, steer it with a prompt, and upscale the result up to 4K. In this guide we walk through the entire process step by step — from prep to prompt, from material definition to lighting.
In this method the AI doesn't invent a scene from scratch. It builds on the geometry you provide and "dresses" it with materials, textures, lighting, and atmosphere. In other words, your viewport image determines the composition, perspective, and proportions; your prompt determines how it looks. The secret to a good result is keeping both inputs clear: a clean base image plus a layered recipe.
This is the most often skipped step, yet the one that most determines the result. The cleaner the geometry the AI sees, the more accurate the render it produces.
In SketchUp:
In Blender:
In 3ds Max / Rhino: the same logic — a clay/viewport image, clean geometry, the right camera, a high-resolution PNG.
Golden rule: Whatever is in the image besides the building/space (text, guides, UI) may be taken as "real" by the AI and added to the render. Keep it simple.
In Oxava, the Nano Banana models accept up to 14 reference images in a single generation. This is very powerful for architectural rendering, because you can use different references for different jobs:
Thanks to the model's strong instruction-following, the style you describe is applied to the structure you provide — without distorting the geometry.
An architectural prompt isn't random words — it's a shot brief. Think through these five layers in order:
Then, to keep it faithful to the geometry, add this to the recipe: "preserve the existing geometry and perspective; apply only materials, textures, and lighting."
Interior example:
"Scandinavian-style open-plan living room, light oak parquet floor, matte white walls, floor-to-ceiling glass facade, linen-fabric beige sofa, walnut coffee table, brass pendant lighting; soft golden afternoon light from the west, long shadows on the east wall; 35mm lens, eye level, photorealistic interior photograph, 16:9. Preserve the existing geometry and perspective, apply only materials and lighting."
Exterior example:
"Two-story detached house, exposed white concrete and charred cedar wood cladding facade, floor-to-ceiling glass with black metal framing, flat roof; natural grasses and a concrete walkway in the garden; warm side light near sunset, long shadows at a 15° angle, faint morning mist in the background; 24mm wide angle, eye level, architectural photography style, 16:9. Preserve the massing and proportions from the model."
Notice that in both examples the subject, materials, light, camera, and style are each given one by one. Vague recipes like "modern living room render" produce flat and lifeless results.
This is the lifeblood of the method: how much should the AI deviate from your model? Stay very faithful and the materials/lighting change while your design is preserved; go too loose and it inspires the massing study but the proportions can drift.
Professional tools have a dedicated "fidelity" slider for this. In Oxava, the practical control comes down to these three levers:
This is the part people are most curious about — because the real power lies here.
Defining (two ways):
Swapping (A/B testing):
Keep the same viewport fixed and change one single thing in the prompt, then compare:
Facade test: "...brick facade..." → "...vertical wood paneling facade..." → "...exposed concrete facade..."
Within seconds you see the brick, wood, and concrete variants side by side and make a material decision without redrawing the model. You can run the same test by swapping a fabric/stone sample image as well.
Tip: When you want to change a single element (e.g., just one window frame), the general method is region-select editing called "inpainting". In Oxava, the most practical route instead is to regenerate with a more targeted prompt + a clean base, or to crop that region and take a separate variation.
Lighting is what separates a render that looks like "computer output" from one that looks like a "photograph". The most common mistake: not specifying the light. State the time of day and the direction of the light first:
Interiors are viewed up close; that's why material textures, furniture style, and light quality come to the fore. Clearly define whether surfaces are matte/glossy, the style of the furniture, and the direction of the window light.
Exteriors are framed more widely; the surroundings, landscaping, and materials read from a distance (brick, cladding, roof type) along with the drama of the light are what matter. Add context to the foreground such as natural grasses, trees, and a walkway.
For those who want pixel-level control and have a powerful GPU, Stable Diffusion + ControlNet is an alternative route. You capture depth, canny, MLSD (ideal for straight lines), or segmentation passes from Blender and feed them to ControlNet as input, keeping the geometry very tightly preserved; negative prompts also help here. In return, the setup and hardware burden are high. If you're looking for the fast, setup-free path that runs in the browser, the reference + prompt approach is more than enough for most jobs.
Open a model, capture its viewport as a clean PNG, and upload it as a reference in Oxava; write your five-layer recipe and produce your first render. To go deeper into prompt writing, take a look at our How to Write an AI Image Prompt? guide, then head straight to the studio and bring your first scene to photorealistic life.
Be the first to hear about new techniques, model updates and ideas on AI generation.