Agentic Development Workflow

From GitHub Issue to Shipped Code — Using AI Agents at Every Step

5Pipeline Stages
PRDStarts Here
TDDCore Method
100%AI-Assisted
GitHubSource of Truth

Agentic development is a disciplined workflow where AI agents handle each phase of the software development lifecycle — from decomposing a GitHub issue into a Product Requirements Document, writing user stories, generating failing tests, and finally producing code that makes those tests pass. The result is traceable, reviewable, and safe — every line of code is justified by a test, every test by a user story, and every story by a real requirement.

Stage 1 — GitHub Issue → Product Requirements Document (PRD)

Every feature or bug fix starts with a GitHub Issue. The agent's first job is to transform that loose description into a structured PRD.

What the agent does:

PRD Template the agent produces:
```

Problem

<1-2 sentences from the issue>

Goal

<What success looks like>

Scope

Acceptance Criteria

1. Given X, when Y, then Z
2. ...

Constraints

```

Why this matters: Without a PRD, agents hallucinate scope. The PRD is the contract everything else is checked against.

Stage 2 — PRD → User Stories

The agent breaks the PRD into atomic, testable user stories using the standard format. Each story maps to exactly one behavior — small enough to test, large enough to matter.

Format:
```
As a <persona>, I want <action> so that <outcome>.

Acceptance Tests:

```

Agent rules for good stories:
Example (from a rate-limiting issue):
```
As an API consumer, I want requests above my tier limit to be rejected
with a 429 status so that I receive clear feedback instead of silent failures.

Acceptance Tests:
```

Typically 3–8 stories per issue. More than 8 means the issue should be split.

Typical number of user stories per issue type

Stage 3 — User Stories → Tests (TDD First)

This is the heart of the workflow. The agent writes failing tests before any implementation code exists. This is Test-Driven Development (TDD) enforced by the agent.

What the agent generates:

Agent prompt pattern:
```
Given this user story and its acceptance criteria,
write pytest tests that:
1. Test ONLY the behavior described — not implementation
2. Are runnable immediately (will fail — that's expected)
3. Use descriptive names: test_<scenario>_<expected_outcome>
4. Include a docstring linking back to the user story ID
```

Example output:
```python
def test_rate_limit_exceeded_returns_429():
"""Story: API consumer sees 429 when over limit."""
client = APIClient(tier="free") # 100 req/hr
for _ in range(100):
client.get("/data")
response = client.get("/data") # 101st request
assert response.status_code == 429
assert "Retry-After" in response.headers

def test_rate_limit_resets_after_window():
"""Story: Counter resets after time window."""
client = APIClient(tier="free")
exhaust_limit(client)
advance_time(hours=1)
response = client.get("/data")
assert response.status_code == 200
```

Run the tests. They should ALL fail. If any pass before implementation, the test is wrong or the feature already exists.

The Red-Green-Refactor Loop Red — Agent writes tests. All fail. ✅ Expected.

Green — Agent writes minimum code to make tests pass. No gold-plating.

Refactor — Agent cleans up the implementation without breaking tests.

This loop is the atomic unit of agentic development. Each story goes through its own loop.

Stage 4 — Tests → Implementation Code

Now the agent writes code with one constraint: make the tests pass, nothing more.

Agent instructions for this phase:
```
You have a set of failing tests. Write the minimum implementation
that makes all tests pass. Rules:

```

The agent workflow:
1. Reads the test file
2. Reads relevant existing source files (for context and style)
3. Writes or modifies implementation files
4. Runs `pytest` — iterates until green
5. Runs linter (`ruff`, `eslint`, etc.) — fixes issues
6. Checks test coverage — flags if < 80% on new code

What the agent does NOT do:

Confidence score across the pipeline — clarity compounds at each stage

Stage 5 — Closing the GitHub Loop

Once tests are green, the agent prepares the PR and links everything back to the original issue.

Agent-generated PR description:
```markdown

Summary

Closes #<issue_number>

What changed

User Stories addressed

Tests

How to review

1. Start at `middleware/rate_limit.py`
2. See `tests/test_rate_limit.py` for behavior spec
```

CI gate: The repo's CI pipeline runs all tests. The PR cannot be merged until:
The original GitHub issue is auto-closed when the PR merges. The entire chain — issue → PRD → stories → tests → code → PR — is traceable in git history.

Tooling & Agent Setup

Recommended stack for this workflow:

PhaseTool
Issue intakeGitHub Issues / Linear
Agent orchestrationClaude Code, Cursor, Aider, or custom LangChain agent
PRD & story generationClaude 3.5+ (strong reasoning)
Test generationClaude or GPT-4o with repo context
Code generationClaude Code, Aider, Cursor Composer
Test runnerpytest, jest, vitest (language-dependent)
CI/CDGitHub Actions
Coveragepytest-cov, Istanbul

Context injection pattern — Feed the agent:
1. The GitHub issue (full body + comments)
2. The relevant source files (not the whole repo)
3. The existing test patterns (so it matches your style)
4. The PRD (carried forward through all stages)

Repo setup checklist:

Common Failure Modes Skipping the PRD — Agents drift without a written contract. Always generate the PRD first, even for small issues.

Tests that test implementation, not behavior — If tests break when you refactor internals without changing behavior, they're wrong. Test the contract, not the internals.

Agent modifies tests to pass — This is the cardinal sin. The tests are the spec. If they're hard to pass, fix the PRD — not the test.

Too-large issues — Issues that generate 10+ stories should be split into multiple issues. Agents lose context and coherence on large scopes.

No repo context — An agent writing code without seeing the existing patterns will produce inconsistent, unidiomatic code. Always inject relevant source files.

Approximate agent effort distribution per issue

What if the GitHub issue is vague or underspecified?

The agent should ask clarifying questions before generating the PRD — or flag specific ambiguities in the PRD itself. A 'Unknowns' section in the PRD is a healthy pattern.

Can this work for bug fixes, not just features?

Absolutely. For bugs: the first test the agent writes is a regression test that reproduces the bug. Once it fails, implementation makes it green. The bug can never silently return.

What about refactors with no new behavior?

Pure refactors should have a test suite that already passes before and after. The agent verifies no tests regress. No new tests needed unless coverage gaps are found.

How do I handle flaky tests the agent generates?

Flag them in code review. Flaky tests are usually a sign the agent is testing timing or external state. Push back to behavior-based assertions with proper mocking.

Should the agent write the PR description or a human?

Agent writes the first draft — it has all the context. Human edits for tone and adds any nuance the agent missed. Humans review the PR; agents draft it.

Sources

This report was created by Xavior
AI with the tools to actually act.
Sends emails. Manages your calendar. Writes the next one.
Try free → xavior.ai