Replit Agent 3: Full Review, Costs & Test Build (Sep 2025)

Published on September 14, 2025

Gerrard

Replit’s latest agent (v3) is a substantial upgrade aimed at real, end-to-end software work. It’s longer-thinking, more autonomous, and—crucially—can now “use a computer” to click around and test your app inside a mini browser. Below is a hands-on article based on building (and repeatedly fixing) a brand-new web app: a YouTube thumbnail creator with AI image generation/editing, a canvas editor, and one-click export.

Watch the full review and test build video here: https://www.youtube.com/watch?v=rTs8bITW9Eg

TL;DR

Agent 3 highlights: ~10× longer reasoning, autonomous multi-step coding, a computer-use testing mode, and a new interface for building agents and automation workflows.
It can test your app by clicking, typing, and exporting, then show a replay.
Great for: longer, well-scoped changes and batch polish passes.
Costs are still work-based (not time-based). Avoid overusing the high-power model unless you’re truly stuck.

Full video here: https://www.youtube.com/watch?v=rTs8bITW9Eg

What’s New in Agent 3 (and Why It Matters)

Longer, more deliberate reasoning. It takes more time per message but completes more work per turn and catches small issues v2 often missed.
Autonomous coding. Give it a goal; it plans sub-tasks and executes multiple actions without constant hand-holding.
Computer-use testing. Agent 3 opens your app in a lightweight browser, clicks buttons, fills forms, chooses dropdowns, and attempts exports—then provides a replay.
Agent & workflow builder UI. A cleaner place to define projects, choose App Themes, and automate multi-step tasks and tests.

The Test Build: A YouTube Thumbnail Creator

Goal. A canvas-style web app that uses AI to:

Generate images (OpenAI GPT-Image-1) and add them to a draggable, resizable canvas.
Edit existing images (initially with GPT-Image-1’s edits; later swapped to Ideogram for better face consistency).
Add text, shapes, and arrows, change colors and fonts, and support layer ordering (bring to front/send backward).
Provide a prompt bar above the canvas to create or edit elements, with drag-and-drop image upload into the bar for edits.
Export the whole canvas as a thumbnail.

Kickoff.

Select Web App.
Pick an App Theme (we chose Highlighter) so the agent scaffolds a consistent UI.
Provide a detailed prompt including API docs for GPT-Image-1 (create/edit).

First pass results.

Agent produced a working scaffold and initial UI consistent with the chosen theme.
Early costs appeared reasonable (e.g., $1.92, then $1.68) while iterating.

Pain points found during manual testing.

Text editing didn’t work initially.
Only a circle shape existed; resizing was laggy.
Some property panels weren’t wired.
Image edit requests worked, but controls felt rough.

Fixing with feedback.

Screenshot the app, drag it into the chat, and list issues.
The agent iterated: text editing fixed, font options added, color pickers wired, resizing became responsive, and drag-to-upload into the prompt bar arrived.
Added layer controls (bring to front/send backward).
Swapped image editing to Ideogram for more consistent faces when editing the author’s own photos—accuracy improved notably.

Letting Agent 3 Test the App (Computer-Use Mode)

What it did well

Created a fresh project, typed prompts, generated images, added text, chose fonts (e.g., Impact), increased sizes, and attempted color changes.
The replay viewer is excellent for diagnosing where it got stuck.

Where it stumbled

Sometimes got hung up on a minor step (e.g., changing text color).
Export worked, but it placed white text over a white area, making it invisible.
It appeared stuck around download/exports—likely a limitation of how the testing environment perceives file downloads.

Takeaway

Use computer-use tests for asynchronous polish passes, not while you’re actively watching it click. Queue a list of small tasks (10–15 items) and let it run. If you’re in fast “builder mode,” manual testing + directed agent prompts is faster.

Cost of a full test run

~15 minutes of autonomy cost $1.46 (standard model), suggesting billing is tied to actions/work, not clock time.

Costs: What I Actually Paid—and How to Keep Yours Down

Total project spend: $44.87.
One high-power request used to rush through a cloud-storage permissions bug cost $7.52 by itself.
A realistic non-rushed build, sticking to clear prompts and the standard model, looks closer to $25–$30.
Guidelines to control costs
- Default to the standard model. Only escalate to high-power when you’ve isolated a knotty problem.
- Be specific: one issue per request; include screenshots; show current vs. desired behavior.
- Batch non-urgent UX polish into a single autonomy test run.
- Leverage App Themes early to avoid iterative “make it pretty” thrash.

Recommended Workflow With Agent 3

Scope tightly. Write a thorough prompt (problem, feature list, APIs, expected UI, and success criteria). Include any relevant API snippets.
Pick an App Theme. Themes make v3’s output more consistent and save countless UI cleanups later.
Generate the first build. Let it scaffold, then run the app.
Test manually first. Note UX gaps and functional bugs. Screenshot and annotate directly to the agent.
Iterate in focused passes. “Fix text editing,” then “add layer controls,” then “optimize resizing,” etc. Keep changes atomic.
Add computer-use tests once you’ve got core features working. Provide a checklist (e.g., create project → generate image → add text → change font → export).
- Use Max Autonomy for long polish sessions while you do something else.
- Use replay to debug when it misclicks or stalls.
Mind your model choice. Standard model for most work; high-power only for stubborn, isolated problems.
Swap or mix AI providers if needed. For consistent faces, using Ideogram for edits alongside GPT-Image-1 for icons/backgrounds worked well.
Finalize export. Ensure your export path is evident to the agent and to users (filename, dimensions, aspect ratio).

Starter Prompts You Can Steal

Project creation prompt (condensed)

Build a web app: a YouTube thumbnail creator.
Canvas editor: add/drag/resize images, text, shapes, arrows; layer ordering; color & font controls.
Prompt bar above canvas to generate icons/backgrounds via OpenAI GPT-Image-1 (create).
Prompt bar also supports image edits; users can upload an image and switch to edit mode.
Drag-and-drop image upload into the prompt bar.
One-click export to a standard YouTube thumbnail size.
I’ve pasted GPT-Image-1 create/edit API docs below. Use the Highlighter app theme for consistent styling.

Issue iteration prompt

Here’s a screenshot. Please fix:
Text tool doesn’t allow inline editing.
Only circle shape exists—add rectangle and arrow; resize must be smooth.
Layer controls (bring to front/send backward).
Color pickers don’t apply to selected element.
Export should preserve canvas bounds at 1280×720.

Computer-use test script

In the app preview:
Create a new project.
Prompt: “Excited person at a laptop, vibrant background.” Generate.
Add text “10 AI THUMBNAIL TRICKS,” font Impact, white, large.
Add a rectangle banner behind text, set banner orange, send banner backward.
Export as PNG and confirm a non-empty file.

What Worked Well

Thoroughness. Agent 3’s “architect → plan → act” cadence landed bigger, more reliable changes per message.
App Themes. Huge time saver for consistent styling.
Replay for tests. Great visibility into what happened during autonomy.

What Still Needs Finesse

Over-persistence on tiny tasks. It can tunnel on a minor UI tweak (e.g., color change) instead of skipping ahead. A “skip step” button during tests would help.
Export handling in tests. The test environment doesn’t always cleanly detect file downloads.
Live dev loop speed. Watching it click is slower than you doing it—use autonomy when you’re not waiting on it.

Final Verdict

Agent 3 is a real step up for practical app building. It’s slower per message but delivers more work per turn and catches details v2 missed. Use App Themes to stabilize UI quality, iterate with tight prompts plus screenshots, and reserve computer-use tests for asynchronous polish passes. Keep an eye on model choice to control spend.

In our build, we went from zero to a functioning AI thumbnail studio: generate/edit images, mix models for better face consistency (Ideogram for edits, GPT-Image-1 for icons/backgrounds), add text and shapes with proper layering, and export a ready-to-use thumbnail—all inside Replit, largely orchestrated by Agent 3.

If you adopt the workflow above, you’ll get the most out of Agent 3’s autonomy without letting costs or run time get away from you.