CosX AI

14 min read

AI coding tools have changed the way software is written. A few years ago, the bottleneck was usually implementation speed. Today, an engineer can ask an AI agent to generate models, APIs, tests, frontend components, scripts, documentation, and deployment changes in minutes.

But faster code generation has also exposed a new problem: unclear intent creates faster chaos.

When the instruction is vague, the agent does not just move slowly. It moves confidently in the wrong direction. It may invent a missing API, ignore an existing pattern, skip a business rule, expose the wrong field, break tenant isolation, or solve a slightly different problem from the one the product actually needed.

At CosX, we do not see AI agents as magic developers. We treat them as powerful execution engines that need strong contracts.

Spec-driven development helps the agent understand the task. Contract-driven engineering helps the team trust the output. That distinction matters when AI moves from autocomplete to autonomous delivery.

That is why we have moved toward what we call Contract-Driven Agentic Engineering.

It is related to spec-driven development, but broader. Spec-driven development says: define what needs to be built before building it. Contract-driven engineering goes further. It says: define the product behavior, API expectations, data rules, edge cases, testing evidence, review gates, and handoff requirements before an AI-assisted implementation can be considered complete.

Why “vibe coding” does not scale

Vibe coding is useful for prototypes. You describe an idea in natural language, let the AI generate code, run it, tweak it, and move fast. For experiments, demos, and throwaway proof-of-concepts, this can be extremely productive.

Production software is different.

In production, we need to care about existing architecture, database constraints, API compatibility, frontend expectations, tenant isolation, authentication, authorization, observability, testability, migration safety, rollback behavior, code review, and long-term maintainability.

A vague prompt like “build the export API” is not enough. The agent needs to know: export what, for whom, in which format, filtered by what parameters, authorized by which role, paginated or async, with what error behavior, and how the frontend will consume it.

Without that contract, the agent may still produce code. But code is not the same as a correct system.

Core belief

The value of engineering is moving upward. Instead of only asking, “Can you write this code?”, the better question is, “Can we define the right contract so that humans and agents can implement, verify, and evolve this safely?”

What we mean by Contract-Driven Agentic Engineering

At CosX, we use the word “contract” broadly. It is not only an API contract.

Product What should happen from the user’s perspective.

API Endpoints, request parameters, response schemas, status codes, errors.

Data Models, fields, relationships, validation, constraints, indexes.

Security Permissions, tenant boundaries, object-level access.

Testing What must be proven through unit, integration, and endpoint checks.

Handoff What frontend, QA, or client teams need to consume the change.

The more autonomous the agent becomes, the more explicit these contracts need to be.

An AI agent can generate implementation quickly, but it cannot be allowed to silently decide business behavior. Should an empty report return 200 with an empty list or 404? Should a cancelled booking be editable? Should a user from one tenant be able to reference an ID from another tenant? Should a filter default to the last 30 days or all historical records?

These are not just coding details. These are product and system contracts.

The mental model: contract first, code second

Our preferred mental model is:

text

Intent → Contract → Plan → Implementation → Verification → Review → Handoff

Each stage produces something explicit.

The intent may come from a client conversation, a Jira ticket, a product requirement, a design file, or an internal platform need. But before implementation starts, we convert that intent into a contract.

That contract does not need to be a 20-page document. In fact, the best contracts for AI-assisted engineering are usually structured, concise, and testable.

What user or system problem are we solving?
What should be created or changed?
What must not change?
What inputs are accepted?
What outputs are expected?
What edge cases matter?
What existing patterns should be followed?
What tests must pass?
What is explicitly out of scope?

This is how we reduce ambiguity before giving work to an agent.

Example: vague instruction vs contract-driven instruction

Consider a vague instruction:

text

Build an API to download customer activity.

An AI agent can do something with this. But the output will depend heavily on assumptions.

A contract-driven version looks different:

text

Feature: Customer Activity Export

Goal:
Allow an admin user to export customer activity records for a selected date range.

Endpoint:
GET /api/customers/activity/export/

Query parameters:
- start_date: required, ISO date
- end_date: required, ISO date
- customer_id: optional
- status: optional enum

Rules:
- Only admin users can access this endpoint.
- Results must be scoped to the current tenant.
- Date range cannot exceed 90 days.
- If no records match, return 200 with an empty CSV.
- Invalid date ranges return 400.
- Unknown customer_id returns 404.
- Export must include customer name, activity type, timestamp, and status.

Out of scope:
- Scheduled exports
- Email delivery
- Dashboard UI changes

Tests:
- Admin can export valid date range.
- Non-admin receives 403.
- Cross-tenant records are not included.
- Date range above 90 days returns 400.
- Empty result returns valid empty CSV.

This gives the agent a bounded execution space. The agent still writes code, but the important decisions are already captured.

That is the difference between asking AI to “figure it out” and asking AI to “execute against a contract.”

The role of codebase grounding

One of the most common failures in AI-assisted development is context blindness.

The agent may generate a clean implementation that looks reasonable in isolation, but does not fit the existing system. It may create a new service when one already exists, use a different naming convention, duplicate business logic, miss a shared enum, ignore an existing repository pattern, or bypass an established permission layer.

That is why codebase exploration is a first-class step in our process.

Before implementation, the agent must inspect the relevant models, services, schemas, endpoints, enums, tests, and routing patterns. We want the implementation to feel native to the codebase, not pasted from a generic tutorial.

Generic output

Looks clean alone

New patterns, duplicate services, guessed fields, weak alignment.

Grounded output

Fits the system

Existing models, shared enums, native services, scoped queries.

If the system already has a pattern for services, repositories, schemas, endpoints, and tests, then a new module should follow the same structure unless there is a strong reason not to.

This matters because maintainability is not only about whether the code works today. It is also about whether the next engineer can understand it tomorrow.

From tickets to engineering contracts

A Jira ticket is not automatically a good engineering contract.

Many tickets describe what the business wants, but not enough for safe implementation. They may include a title, a short description, a few acceptance criteria, and maybe a design link. That is useful, but the agent still needs a more precise implementation boundary.

So we transform a ticket into an engineering contract.

text

1. Problem statement
2. User or system behavior
3. Data model impact
4. API impact
5. Business rules
6. Permission rules
7. Edge cases
8. Existing code patterns to reuse
9. Test expectations
10. Out of scope items

This does not need to become bureaucracy. The point is not documentation for its own sake. The point is to convert ambiguity into executable clarity.

Example: booking workflow

Imagine we are building a booking workflow.

A product-level requirement might say:

text

Users should be able to book an appointment for a patient.

That is not enough.

A contract-driven version would ask:

text

Who is the user?
Can one user manage multiple patients?
Is OTP required?
Can a slot expire?
Can the same slot be booked twice?
What happens if payment succeeds but booking creation fails?
What is the booking status lifecycle?
Can admins create assisted bookings?
What should the frontend show after payment?
What should happen on retry?

Once these questions are answered, implementation becomes much safer.

The contract may define:

text

Booking states:
- DRAFT
- PENDING_PAYMENT
- CONFIRMED
- CANCELLED
- FAILED

Rules:
- A slot cannot be double-booked.
- A user can add multiple patients.
- Payment confirmation must be idempotent.
- Booking confirmation must be tenant-scoped.
- Failed payment should not confirm the booking.
- Repeated webhook delivery must not create duplicate records.

Now the agent is not just generating a booking API. It is implementing a state machine with clear boundaries.

Tests are part of the contract

In traditional development, tests are often written after implementation. In contract-driven engineering, test expectations are defined before or during implementation planning.

The tests are not just a safety net. They are executable proof that the contract has been respected.

If the contract says “users from one tenant must not access records from another tenant,” then there must be a test for that. If the contract says “reapplying the same coupon should not double-discount,” then there must be a test for idempotency. If the contract says “date range cannot exceed 90 days,” then there must be a test for that invalid input.

Testing principle

Happy-path tests are not enough. The contract must include success cases, validation failures, permission failures, empty states, boundary conditions, idempotency, cross-tenant access attempts, and backward compatibility checks where relevant.

This is especially important with AI-generated code because the code can look clean while missing negative cases.

Review loops: why one AI pass is not enough

We do not treat the first agent output as final.

A strong agentic workflow needs multiple review passes. One pass may focus on implementation. Another may focus on simplification. Another may focus on security. Another may focus on API behavior. Another may focus on PR review comments.

This mirrors how good human teams work. A developer writes the code, then reviews it, then runs tests, then responds to review feedback, then verifies behavior in an environment closer to reality.

The difference is that agents can do many of these loops faster, as long as the workflow makes them explicit.

The important question is not “Did the agent write code?” The important question is “What evidence do we have that the code satisfies the contract?”

That evidence may include test results, API response examples, migration checks, review findings, fixed issues, known assumptions, out-of-scope notes, and frontend handoff details.

API contracts and frontend handoff

Backend changes do not end at the backend.

If an API endpoint is created or modified, the frontend team needs a clear contract. This includes endpoint paths, request parameters, response examples, error responses, loading states, empty states, status mappings, and interaction behavior.

A poor handoff says:

text

API is ready. Please integrate.

A useful handoff says:

text

Area          Example handoff detail
------------  ---------------------------------------------------------------------------------------------------------------
Endpoint      GET /api/reports/activity/
Query params  start_date, end_date, status
Response      { "count": 25, "results": [...] }
UI behavior   Show loading state while fetching, empty state when count is zero, and validation error for invalid date range.
Permissions   Disable export button for non-admin users.

This reduces rework and prevents frontend engineers from reverse-engineering backend behavior.

In a contract-driven workflow, the API is not “done” when the code is merged. It is done when the consuming team has enough information to integrate it correctly.

Where AI agents fit best

AI agents are excellent at execution when the boundaries are clear.

Agents are strong at

Reading existing patterns
Creating boilerplate
Implementing service logic
Writing tests
Fixing test failures
Preparing PR summaries
Refactoring duplicated code

Agents need guardrails for

Ambiguous business behavior
Unstated client expectations
Conflicting requirements
Product tradeoffs
Long-term architecture decisions
Scope control
Security-sensitive changes

So the human role changes. The human does not need to hand-write every line of code. But the human must define the contract, review assumptions, and decide tradeoffs.

This is why we believe the future of software teams is not “AI replaces engineers.” It is “engineers become better spec writers, system designers, reviewers, and orchestrators.”

Contract-driven does not mean heavy process

One concern with this approach is that it may sound slow. It does not have to be.

The contract should match the risk level.

text

Small internal tool:
- Goal
- Inputs
- Output
- Files affected
- Test command

Production API:
- Endpoint
- Auth
- Permissions
- Request schema
- Response schema
- Business rules
- Errors
- Tests
- Frontend handoff

Payments, data migration, compliance, or multi-tenant system:
- Idempotency
- Rollback
- Audit logs
- Security
- Data integrity
- Backward compatibility
- Failure recovery

The point is not to create the same amount of process for every task. The point is to make the implicit explicit wherever mistakes are expensive.

What we deliberately do not automate blindly

There are parts of engineering where full autonomy is risky.

We do not want agents to silently change public API behavior, weaken authorization logic, create destructive migrations, change billing or payment rules, modify cross-tenant data access, refactor unrelated modules, upgrade dependencies without need, or expand scope beyond the ticket.

These require either explicit contract coverage or human approval.

Guardrail

A good agentic workflow should not only say what the agent should do. It should also say what the agent must not do. Constraints are not friction. Constraints are what make autonomy safe.

A simplified version of our workflow

Without exposing internal implementation details, our high-level workflow looks like this:

text

1. Understand the requirement
2. Convert it into an engineering contract
3. Ground the agent in the existing codebase
4. Create an implementation plan
5. Implement within scope
6. Run tests and checks
7. Simplify and review the change
8. Verify API behavior where applicable
9. Prepare a clear PR
10. Create frontend or QA handoff where needed

This gives us the speed of AI-assisted development without giving up engineering discipline.

The workflow is not perfect, and it keeps evolving. But it has already changed how we think about delivery. The biggest improvement is not just faster coding. It is fewer misunderstandings between requirement, implementation, review, and handoff.

The real output is not code. It is confidence.

AI can generate code. But production teams need confidence.

Confidence that the right problem was solved. Confidence that existing architecture was respected. Confidence that the API contract is clear. Confidence that edge cases were handled. Confidence that security boundaries were preserved. Confidence that tests prove the behavior. Confidence that the frontend team can integrate without guessing. Confidence that reviewers can understand what changed and why.

That confidence does not come from prompting harder. It comes from better contracts.

Conclusion

Spec-driven development is a strong starting point for AI-assisted software development. But for production engineering, especially with agentic workflows, we believe the better model is contract-driven.

A spec tells the agent what to build. A contract tells the entire team what must be true for the work to be accepted.

At CosX, this is how we are moving from vibe coding to reliable agentic engineering. We still use AI heavily, but we do not treat AI as a replacement for engineering judgment. We use it as an execution layer inside a disciplined system of contracts, tests, reviews, and handoffs.

The future of software development will not belong to teams that simply generate the most code. It will belong to teams that can define intent clearly, encode it into contracts, and use AI agents to execute safely at scale.

Written by

CosX AI

Published

June 19, 2026

Duration

14 min read

From Spec-Driven Development to Contract-Driven Agentic Engineering

Why “vibe coding” does not scale

What we mean by Contract-Driven Agentic Engineering

The mental model: contract first, code second

Example: vague instruction vs contract-driven instruction

The role of codebase grounding

Looks clean alone

Fits the system

From tickets to engineering contracts

Example: booking workflow

Tests are part of the contract

Review loops: why one AI pass is not enough

API contracts and frontend handoff

Where AI agents fit best

Agents are strong at

Agents need guardrails for

Contract-driven does not mean heavy process

What we deliberately do not automate blindly

A simplified version of our workflow

The real output is not code. It is confidence.

Conclusion

More from CosX

Same brain, different model: 15 LLMs on one production agent

Don’t Get Left Behind: The Smartest Web3 Chains for Forward-Thinking Businesses

Python 3.14 and the Fall of the GIL: How My Code Finally Used All Eight Cores