5Pipeline Stages
PRDStarts Here
TDDCore Method
100%AI-Assisted
GitHubSource of Truth
Agentic development is a disciplined workflow where AI agents handle each phase of the software development lifecycle — from decomposing a GitHub issue into a Product Requirements Document, writing user stories, generating failing tests, and finally producing code that makes those tests pass. The result is traceable, reviewable, and safe — every line of code is justified by a test, every test by a user story, and every story by a real requirement.
Stage 1 — GitHub Issue → Product Requirements Document (PRD)
Every feature or bug fix starts with a GitHub Issue. The agent's first job is to transform that loose description into a structured PRD.
What the agent does:
- Reads the issue title, body, labels, and linked discussions
- Identifies the problem statement (what's broken or missing)
- Defines scope (what's in and out of this PR)
- Lists acceptance criteria in plain English
- Flags technical constraints (language, framework, API limits, backwards compatibility)
PRD Template the agent produces:```
Problem
<1-2 sentences from the issue>
Goal
<What success looks like>
Scope
Acceptance Criteria
1. Given X, when Y, then Z
2. ...
Constraints
- Must not break existing API contract
- Target: Python 3.11+
```
Why this matters: Without a PRD, agents hallucinate scope. The PRD is the contract everything else is checked against.
Stage 2 — PRD → User Stories
The agent breaks the PRD into atomic, testable user stories using the standard format. Each story maps to exactly one behavior — small enough to test, large enough to matter.
Format:
```
As a <persona>, I want <action> so that <outcome>.
Acceptance Tests:
- [ ] Scenario A: ...
- [ ] Scenario B (edge case): ...
- [ ] Scenario C (failure case): ...
```
Agent rules for good stories:- One story = one unit of observable behavior
- Each story must have at least one happy path and one failure/edge case
- Stories must be independently testable — no hidden dependencies
- No implementation details in the story ("uses Redis" is wrong; "caches results" is right)
Example (from a rate-limiting issue):```
As an API consumer, I want requests above my tier limit to be rejected
with a 429 status so that I receive clear feedback instead of silent failures.
Acceptance Tests:
- [ ] Returns 429 when limit exceeded
- [ ] Returns 200 when under limit
- [ ] Resets counter after window expires
- [ ] Returns correct Retry-After header
```
Typically
3–8 stories per issue. More than 8 means the issue should be split.
Typical number of user stories per issue type
Stage 3 — User Stories → Tests (TDD First)
This is the heart of the workflow. The agent writes failing tests before any implementation code exists. This is Test-Driven Development (TDD) enforced by the agent.
What the agent generates:
- Unit tests — one per acceptance criterion
- Edge case tests — boundary values, null inputs, empty states
- Integration tests — when the story touches multiple systems
- Contract tests — when the story changes an API interface
Agent prompt pattern:```
Given this user story and its acceptance criteria,
write pytest tests that:
1. Test ONLY the behavior described — not implementation
2. Are runnable immediately (will fail — that's expected)
3. Use descriptive names: test_<scenario>_<expected_outcome>
4. Include a docstring linking back to the user story ID
```
Example output:```python
def test_rate_limit_exceeded_returns_429():
"""Story: API consumer sees 429 when over limit."""
client = APIClient(tier="free") # 100 req/hr
for _ in range(100):
client.get("/data")
response = client.get("/data") # 101st request
assert response.status_code == 429
assert "Retry-After" in response.headers
def test_rate_limit_resets_after_window():
"""Story: Counter resets after time window."""
client = APIClient(tier="free")
exhaust_limit(client)
advance_time(hours=1)
response = client.get("/data")
assert response.status_code == 200
```
Run the tests. They should ALL fail. If any pass before implementation, the test is wrong or the feature already exists.
The Red-Green-Refactor Loop Red — Agent writes tests. All fail. ✅ Expected.
Green — Agent writes minimum code to make tests pass. No gold-plating.
Refactor — Agent cleans up the implementation without breaking tests.
This loop is the atomic unit of agentic development. Each story goes through its own loop.
Stage 4 — Tests → Implementation Code
Now the agent writes code with one constraint: make the tests pass, nothing more.
Agent instructions for this phase:
```
You have a set of failing tests. Write the minimum implementation
that makes all tests pass. Rules:
- Do not add functionality not tested
- Do not modify the tests
- Match existing code style and patterns in the repo
- Add inline comments only where logic is non-obvious
- If a test seems impossible to pass cleanly, flag it — don't hack it
```
The agent workflow:1. Reads the test file
2. Reads relevant existing source files (for context and style)
3. Writes or modifies implementation files
4. Runs `pytest` — iterates until green
5. Runs linter (`ruff`, `eslint`, etc.) — fixes issues
6. Checks test coverage — flags if < 80% on new code
What the agent does NOT do:- Rewrite unrelated code
- Add "nice to have" features
- Change test assertions to make them easier to pass
- Skip edge case tests because they're hard
Confidence score across the pipeline — clarity compounds at each stage
Stage 5 — Closing the GitHub Loop
Once tests are green, the agent prepares the PR and links everything back to the original issue.
Agent-generated PR description:
```markdown
Summary
Closes #<issue_number>
What changed
- Added rate limiting middleware to API gateway
- New RateLimitStore (Redis-backed) with configurable windows
User Stories addressed
- [US-1] API consumer sees 429 when over limit ✅
- [US-2] Counter resets after window ✅
- [US-3] Retry-After header returned ✅
Tests
- 6 new tests added
- All passing
- Coverage: 94% on new code
How to review
1. Start at `middleware/rate_limit.py`
2. See `tests/test_rate_limit.py` for behavior spec
```
CI gate: The repo's CI pipeline runs all tests. The PR cannot be merged until:
- ✅ All tests pass
- ✅ Coverage threshold met
- ✅ Linter clean
- ✅ PR description links to issue and stories
The original GitHub issue is auto-closed when the PR merges. The entire chain — issue → PRD → stories → tests → code → PR — is traceable in git history.
Tooling & Agent Setup
Recommended stack for this workflow:
| Phase | Tool |
|---|
| Issue intake | GitHub Issues / Linear |
| Agent orchestration | Claude Code, Cursor, Aider, or custom LangChain agent |
| PRD & story generation | Claude 3.5+ (strong reasoning) |
| Test generation | Claude or GPT-4o with repo context |
| Code generation | Claude Code, Aider, Cursor Composer |
| Test runner | pytest, jest, vitest (language-dependent) |
| CI/CD | GitHub Actions |
| Coverage | pytest-cov, Istanbul |
Context injection pattern — Feed the agent:
1. The GitHub issue (full body + comments)
2. The relevant source files (not the whole repo)
3. The existing test patterns (so it matches your style)
4. The PRD (carried forward through all stages)
Repo setup checklist:- `AGENTS.md` — instructions for AI agents working in this repo
- `.cursorrules` or `claude.md` — style guide and patterns
- Pre-commit hooks for linting
- CI requiring green tests before merge
Common Failure Modes Skipping the PRD — Agents drift without a written contract. Always generate the PRD first, even for small issues.
Tests that test implementation, not behavior — If tests break when you refactor internals without changing behavior, they're wrong. Test the contract, not the internals.
Agent modifies tests to pass — This is the cardinal sin. The tests are the spec. If they're hard to pass, fix the PRD — not the test.
Too-large issues — Issues that generate 10+ stories should be split into multiple issues. Agents lose context and coherence on large scopes.
No repo context — An agent writing code without seeing the existing patterns will produce inconsistent, unidiomatic code. Always inject relevant source files.
Approximate agent effort distribution per issue
What if the GitHub issue is vague or underspecified?
The agent should ask clarifying questions before generating the PRD — or flag specific ambiguities in the PRD itself. A 'Unknowns' section in the PRD is a healthy pattern.
Can this work for bug fixes, not just features?
Absolutely. For bugs: the first test the agent writes is a regression test that reproduces the bug. Once it fails, implementation makes it green. The bug can never silently return.
What about refactors with no new behavior?
Pure refactors should have a test suite that already passes before and after. The agent verifies no tests regress. No new tests needed unless coverage gaps are found.
How do I handle flaky tests the agent generates?
Flag them in code review. Flaky tests are usually a sign the agent is testing timing or external state. Push back to behavior-based assertions with proper mocking.
Should the agent write the PR description or a human?
Agent writes the first draft — it has all the context. Human edits for tone and adds any nuance the agent missed. Humans review the PR; agents draft it.
This report was created by Xavior
AI with the tools to actually act.
Sends emails. Manages your calendar. Writes the next one.