Part 12. Organization/Process: Operational Maturity Model and Roadmap

The conclusion of this series is simple. If you only handle the prompts well, operation will not be stable. System design, change control, observability, security, UX, and governance must be brought together. And all of this ultimately converges to organizational and process issues. Technology is implemented, but operation remains a habit.

Based on version

Node.js 20 LTS
TypeScript 5.8.x
Next.js 16.1.x
OpenAI API (Responses API, based on 2026-03 document)
PostgreSQL 15
Redis 7

Raise a problem

As the team grows, the following problems repeat themselves in LLM operations.

Model/prompt changes are fast, but policy/security review cannot follow.
Even if a quality incident occurs, the response is delayed because the boundaries of responsibility are unclear.
There are many experiments, but the learning is not accumulated, so the same failures are repeated.
Operating costs are increasing, but there is no agreement on what indicators should be optimized.

Practical example A: Backlash to a culture of rapid experimentation

There were prompt changes more than 20 times a week, but change records and experiment results were not organized. Each team had different interpretations of whether performance had improved or worsened, and in the end, the overall speed dropped as the operation team conservatively prevented changes.

Practical Example B: Accountability Gaps in Incident Response

A policy violation occurred, but the product team judged it to be a model problem, and the platform team judged it to be a policy problem. Because an incident commander was not designated, actual action was delayed for the first 40 minutes.

Key concepts

Operational maturity should be measured by “reproducible decisions,” not by number of tools. The four-step model below is easy to apply in practice.

steps	Features	Risk	Conditions for going to the next step
L1 experimental type	Personalized tuning, temporary response	Quality deviation, non-reproducibility	Change record/evaluation set introduction
L2 Managed	Existence of basic indicators/distribution procedures	Goal conflict between teams	Common Gate and Separation of Responsibilities
L3 operational	Quality/Security/Cost Integrated Control	Increased complexity	Automated Rollback/Governance
L4 optimized	Learning Loop Automation, Predictive Operations	Over-optimization risk	Regular standard re-verification

Core principles:1. Control the quality of change rather than the speed of change. 2. Responsibilities are fixed with roles, decisions are fixed with data (metrics), and execution is fixed with runbooks. 3. If improvements are not left in the structure after an accident, the maturity level will not increase.

Mermaid diagram rendering...

Practical pattern

Pattern 1: RACI-based role separation

The biggest reason why operations are unstable is because “who makes the decisions” is unclear. Separate responsibility for changes/incidents/deployments through RACI.

work	Responsible	Accountable	Consulted	Informed
Change Prompt	Product Eng	Product Lead	Platform	Support
Policy change	Security Eng	Security Lead	Product, Legal	Platform
Routing/Cost Change	Platform Eng	Platform Lead	Product	Finance
Incident Response	On-call IC	Incident Commander	All Owners	Stakeholders

Operating points:

Systemically blocks “high-risk changes without approval.”
The incident commander is designated within 5 minutes of the incident starting.
In case of role conflict, agree in advance on priority rules (security > safety > quality > cost).

Pattern 2: Fixed Operating Cadence

Maturity comes from repeatable rhythms, not the number of meetings. Learning is cumulative by fixing the minimum routine below.

{
  "cadence": {
    "daily": ["quality-cost anomaly check", "review queue triage"],
    "weekly": ["release gate review", "failed sample curation", "incident action tracking"],
    "monthly": ["policy audit", "maturity score review", "cost architecture adjustment"],
    "quarterly": ["chaos drill", "governance reset", "roadmap reprioritization"]
  }
}

# 성숙도 스코어 계산 스크립트 예시
./ops/maturity/score.sh \
  --quality-gate-coverage 0.85 \
  --rollback-mttr-min 12 \
  --policy-audit-pass-rate 0.97 \
  --incident-repeat-rate 0.08

Operating points:

Do not separate indicator review and action tracking.
Recurring incident items are classified as “operational debt” rather than “technical debt.”
The maturity score is used to determine investment priorities, not to evaluate the team.

Pattern 3: Automating the learning loop of experimentation-deployment-retrospection

Even if there are many experiments, if learning is not accumulated, maturity stagnates. Failure samples and postmortem actions should be fed back to the evaluation set and release gate.

type IncidentLearning = {
  incidentId: string;
  rootCause: string;
  newEvalCases: string[];
  gateUpdates: string[];
  owner: string;
};

export async function applyLearning(input: IncidentLearning) {
  await evalDataset.appendCases(input.newEvalCases);
  await releaseGates.update(input.gateUpdates);
  await backlog.create({
    title: `Postmortem actions for ${input.incidentId}`,
    owner: input.owner,
  });
}

Operating points:- Manage incident action completion rate as quarterly KPI.

The evaluation set is not a correct answer data set, but is operated as a “failure recurrence prevention data set.”
Retrospectives should result in improvements in control, not in seeking responsibility.

Failure cases/anti-patterns

Failure scenario: “Team with tools but no structure”

Situation:

Observation tools, prompt management tools, and policy engines were all introduced.
However, due to the lack of role separation and approval system, high-risk changes were frequently distributed directly.
Similar incidents were repeated three times in two months, reducing customer trust.

Detection procedure:

Check repetition of the same root cause pattern in the accident postmortem
Measure the distribution rate of missing approval by analyzing change logs
Confirm the upward trend in incident MTTR and repeat rate

Mitigation Procedures:

Immediately enforce high-risk change approval workflows
Add repeat incident item to weekly operation review
Designate unfinished postmortem action as release blocker

Recovery Procedure:

Role-based RACI override
Specify step-by-step goals/indicators in the maturity roadmap
Regular quarterly operational training (policy incidents, cost spikes, quality regressions)

Representative antipatterns

Maturity is judged by the number of tools introduced
Finish the incident report in writing and skip the execution trace.
Strategy to conduct experiments quickly but delay investment in rollback capabilities
A structure that does not give integrated priority even though organizational KPIs conflict

Checklist

Has the current team’s maturity level (L1 to L4) been explicitly defined?
Are there approval systems and exception procedures for high-risk changes?
Is RACI documented and working in actual incidents?
Is the operating rhythm (day/week/month/quarter) fixed and executed?
Is incident learning automatically fed back to evaluation set/gate/backlog?
Do you consider repeat incident rate and rollback MTTR as key indicators?
Are there decision-making rules when security/quality/cost priorities conflict?

Summary

The final competitiveness of LLM/Agent operation is not model selection but the operating system. Systems evolve reliably when organizations control change, learn from failure, and clearly separate responsibilities. Prompts may be a starting point, but long-term performance is determined by systems engineering and operational processes.## Next episode preview

This episode is the last of the series. In the next article, based on these 12 articles, we plan to organize the “90-day implementation plan” (simultaneous technology/organizational transition) for actual introduction teams into a separate guide. At each stage, what to fix first and what indicators to verify performance are focused on execution.

Reference link

Previous post: Part 11. 레퍼런스 아키텍처: 엔드투엔드 운영 설계
Next post: None (last part of this series)