Agentic Development Workflow

From GitHub Issue to Shipped Code — Using AI Agents at Every Step

5Pipeline Stages

PRDStarts Here

TDDCore Method

100%AI-Assisted

GitHubSource of Truth

Agentic development is a disciplined workflow where AI agents handle each phase of the software development lifecycle — from decomposing a GitHub issue into a Product Requirements Document, writing user stories, generating failing tests, and finally producing code that makes those tests pass. The result is traceable, reviewable, and safe — every line of code is justified by a test, every test by a user story, and every story by a real requirement.

Stage 1 — GitHub Issue → Product Requirements Document (PRD)

Every feature or bug fix starts with a GitHub Issue. The agent's first job is to transform that loose description into a structured PRD.

What the agent does:

Reads the issue title, body, labels, and linked discussions
Identifies the problem statement (what's broken or missing)
Defines scope (what's in and out of this PR)
Lists acceptance criteria in plain English
Flags technical constraints (language, framework, API limits, backwards compatibility)

PRD Template the agent produces:
```

Problem

<1-2 sentences from the issue>

Goal

Scope

IN: ...
OUT: ...

Acceptance Criteria

1. Given X, when Y, then Z
2. ...

Constraints

Must not break existing API contract
Target: Python 3.11+

```

Why this matters: Without a PRD, agents hallucinate scope. The PRD is the contract everything else is checked against.

Stage 2 — PRD → User Stories

The agent breaks the PRD into atomic, testable user stories using the standard format. Each story maps to exactly one behavior — small enough to test, large enough to matter.

Format:
```
As a <persona>, I want <action> so that <outcome>.

Acceptance Tests:

[ ] Scenario A: ...
[ ] Scenario B (edge case): ...
[ ] Scenario C (failure case): ...

```

Agent rules for good stories:

One story = one unit of observable behavior
Each story must have at least one happy path and one failure/edge case
Stories must be independently testable — no hidden dependencies
No implementation details in the story ("uses Redis" is wrong; "caches results" is right)

Example (from a rate-limiting issue):
```
As an API consumer, I want requests above my tier limit to be rejected
with a 429 status so that I receive clear feedback instead of silent failures.

Acceptance Tests:

[ ] Returns 429 when limit exceeded
[ ] Returns 200 when under limit
[ ] Resets counter after window expires
[ ] Returns correct Retry-After header

```

Typically 3–8 stories per issue. More than 8 means the issue should be split.

Typical number of user stories per issue type

Stage 3 — User Stories → Tests (TDD First)

This is the heart of the workflow. The agent writes failing tests before any implementation code exists. This is Test-Driven Development (TDD) enforced by the agent.

What the agent generates:

Unit tests — one per acceptance criterion
Edge case tests — boundary values, null inputs, empty states
Integration tests — when the story touches multiple systems
Contract tests — when the story changes an API interface

Agent prompt pattern:
```
Given this user story and its acceptance criteria,
write pytest tests that:
1. Test ONLY the behavior described — not implementation
2. Are runnable immediately (will fail — that's expected)
3. Use descriptive names: test_<scenario>_<expected_outcome>
4. Include a docstring linking back to the user story ID
```

Example output:
```python
def test_rate_limit_exceeded_returns_429():
"""Story: API consumer sees 429 when over limit."""
client = APIClient(tier="free") # 100 req/hr
for _ in range(100):
client.get("/data")
response = client.get("/data") # 101st request
assert response.status_code == 429
assert "Retry-After" in response.headers

def test_rate_limit_resets_after_window():
"""Story: Counter resets after time window."""
client = APIClient(tier="free")
exhaust_limit(client)
advance_time(hours=1)
response = client.get("/data")
assert response.status_code == 200
```

Run the tests. They should ALL fail. If any pass before implementation, the test is wrong or the feature already exists.

The Red-Green-Refactor Loop Red — Agent writes tests. All fail. ✅ Expected.

Green — Agent writes minimum code to make tests pass. No gold-plating.

Refactor — Agent cleans up the implementation without breaking tests.

This loop is the atomic unit of agentic development. Each story goes through its own loop.

Stage 4 — Tests → Implementation Code

Now the agent writes code with one constraint: make the tests pass, nothing more.

Agent instructions for this phase:
```
You have a set of failing tests. Write the minimum implementation
that makes all tests pass. Rules:

Do not add functionality not tested
Do not modify the tests
Match existing code style and patterns in the repo
Add inline comments only where logic is non-obvious
If a test seems impossible to pass cleanly, flag it — don't hack it

```

The agent workflow:
1. Reads the test file
2. Reads relevant existing source files (for context and style)
3. Writes or modifies implementation files
4. Runs `pytest` — iterates until green
5. Runs linter (`ruff`, `eslint`, etc.) — fixes issues
6. Checks test coverage — flags if < 80% on new code

What the agent does NOT do:

Rewrite unrelated code
Add "nice to have" features
Change test assertions to make them easier to pass
Skip edge case tests because they're hard

Confidence score across the pipeline — clarity compounds at each stage

Stage 5 — Closing the GitHub Loop

Once tests are green, the agent prepares the PR and links everything back to the original issue.

Agent-generated PR description:
```markdown

Summary

Closes #<issue_number>

What changed

Added rate limiting middleware to API gateway
New RateLimitStore (Redis-backed) with configurable windows

User Stories addressed

[US-1] API consumer sees 429 when over limit ✅
[US-2] Counter resets after window ✅
[US-3] Retry-After header returned ✅

Tests

6 new tests added
All passing
Coverage: 94% on new code

How to review

1. Start at `middleware/rate_limit.py`
2. See `tests/test_rate_limit.py` for behavior spec
```

CI gate: The repo's CI pipeline runs all tests. The PR cannot be merged until:

✅ All tests pass
✅ Coverage threshold met
✅ Linter clean
✅ PR description links to issue and stories

The original GitHub issue is auto-closed when the PR merges. The entire chain — issue → PRD → stories → tests → code → PR — is traceable in git history.

Tooling & Agent Setup

Recommended stack for this workflow:

Phase	Tool
Issue intake	GitHub Issues / Linear
Agent orchestration	Claude Code, Cursor, Aider, or custom LangChain agent
PRD & story generation	Claude 3.5+ (strong reasoning)
Test generation	Claude or GPT-4o with repo context
Code generation	Claude Code, Aider, Cursor Composer
Test runner	pytest, jest, vitest (language-dependent)
CI/CD	GitHub Actions
Coverage	pytest-cov, Istanbul

Context injection pattern — Feed the agent:
1. The GitHub issue (full body + comments)
2. The relevant source files (not the whole repo)
3. The existing test patterns (so it matches your style)
4. The PRD (carried forward through all stages)

Repo setup checklist:

`AGENTS.md` — instructions for AI agents working in this repo
`.cursorrules` or `claude.md` — style guide and patterns
Pre-commit hooks for linting
CI requiring green tests before merge

Common Failure Modes Skipping the PRD — Agents drift without a written contract. Always generate the PRD first, even for small issues.

Tests that test implementation, not behavior — If tests break when you refactor internals without changing behavior, they're wrong. Test the contract, not the internals.

Agent modifies tests to pass — This is the cardinal sin. The tests are the spec. If they're hard to pass, fix the PRD — not the test.

Too-large issues — Issues that generate 10+ stories should be split into multiple issues. Agents lose context and coherence on large scopes.

No repo context — An agent writing code without seeing the existing patterns will produce inconsistent, unidiomatic code. Always inject relevant source files.

Approximate agent effort distribution per issue

What if the GitHub issue is vague or underspecified?

The agent should ask clarifying questions before generating the PRD — or flag specific ambiguities in the PRD itself. A 'Unknowns' section in the PRD is a healthy pattern.

Can this work for bug fixes, not just features?

Absolutely. For bugs: the first test the agent writes is a regression test that reproduces the bug. Once it fails, implementation makes it green. The bug can never silently return.

What about refactors with no new behavior?

Pure refactors should have a test suite that already passes before and after. The agent verifies no tests regress. No new tests needed unless coverage gaps are found.

How do I handle flaky tests the agent generates?

Flag them in code review. Flaky tests are usually a sign the agent is testing timing or external state. Push back to behavior-based assertions with proper mocking.

Should the agent write the PR description or a human?

Agent writes the first draft — it has all the context. Human edits for tone and adds any nuance the agent missed. Humans review the PR; agents draft it.

Sources

Test-Driven Development by Example — Kent Beck — The foundational text on TDD — the methodology underlying this entire workflow.
Anthropic Claude Code Documentation — Agent-native coding tool that implements this workflow natively.
Aider — AI Pair Programming in the Terminal — Open-source CLI agent for agentic development with Git integration.
GitHub — Writing Good Issues — GitHub's own guide on structuring issues for traceability.
AGENTS.md Convention — Community standard for per-repo AI agent instructions files.

This report was created by Xavior

AI with the tools to actually act.

Sends emails. Manages your calendar. Writes the next one.

Try free → xavior.ai