5 min read

Part 12. Organization/Process: Operational Maturity Model and Roadmap

We present an organizational structure, role separation, decision-making system, and maturity roadmap to operate the LLM/Agent system sustainably.

Series: Prompt Engineering이 아니라 System Engineering이 중요한 이유

12편 구성. 현재 12편을 보고 있습니다.

Progress100% Complete
SERIES COMPLETE

The conclusion of this series is simple. If you only handle the prompts well, operation will not be stable. System design, change control, observability, security, UX, and governance must be brought together. And all of this ultimately converges to organizational and process issues. Technology is implemented, but operation remains a habit.

Based on version

  • Node.js 20 LTS
  • TypeScript 5.8.x
  • Next.js 16.1.x
  • OpenAI API (Responses API, based on 2026-03 document)
  • PostgreSQL 15
  • Redis 7

Raise a problem

As the team grows, the following problems repeat themselves in LLM operations.

  • Model/prompt changes are fast, but policy/security review cannot follow.
  • Even if a quality incident occurs, the response is delayed because the boundaries of responsibility are unclear.
  • There are many experiments, but the learning is not accumulated, so the same failures are repeated.
  • Operating costs are increasing, but there is no agreement on what indicators should be optimized.

Practical example A: Backlash to a culture of rapid experimentation

There were prompt changes more than 20 times a week, but change records and experiment results were not organized. Each team had different interpretations of whether performance had improved or worsened, and in the end, the overall speed dropped as the operation team conservatively prevented changes.

Practical Example B: Accountability Gaps in Incident Response

A policy violation occurred, but the product team judged it to be a model problem, and the platform team judged it to be a policy problem. Because an incident commander was not designated, actual action was delayed for the first 40 minutes.

Key concepts

Operational maturity should be measured by “reproducible decisions,” not by number of tools. The four-step model below is easy to apply in practice.

stepsFeaturesRiskConditions for going to the next step
L1 experimental typePersonalized tuning, temporary responseQuality deviation, non-reproducibilityChange record/evaluation set introduction
L2 ManagedExistence of basic indicators/distribution proceduresGoal conflict between teamsCommon Gate and Separation of Responsibilities
L3 operationalQuality/Security/Cost Integrated ControlIncreased complexityAutomated Rollback/Governance
L4 optimizedLearning Loop Automation, Predictive OperationsOver-optimization riskRegular standard re-verification

Core principles:1. Control the quality of change rather than the speed of change. 2. Responsibilities are fixed with roles, decisions are fixed with data (metrics), and execution is fixed with runbooks. 3. If improvements are not left in the structure after an accident, the maturity level will not increase.

Mermaid diagram rendering...

Practical pattern

Pattern 1: RACI-based role separation

The biggest reason why operations are unstable is because “who makes the decisions” is unclear. Separate responsibility for changes/incidents/deployments through RACI.

workResponsibleAccountableConsultedInformed
Change PromptProduct EngProduct LeadPlatformSupport
Policy changeSecurity EngSecurity LeadProduct, LegalPlatform
Routing/Cost ChangePlatform EngPlatform LeadProductFinance
Incident ResponseOn-call ICIncident CommanderAll OwnersStakeholders

Operating points:

  • Systemically blocks “high-risk changes without approval.”
  • The incident commander is designated within 5 minutes of the incident starting.
  • In case of role conflict, agree in advance on priority rules (security > safety > quality > cost).

Pattern 2: Fixed Operating Cadence

Maturity comes from repeatable rhythms, not the number of meetings. Learning is cumulative by fixing the minimum routine below.

{
  "cadence": {
    "daily": ["quality-cost anomaly check", "review queue triage"],
    "weekly": ["release gate review", "failed sample curation", "incident action tracking"],
    "monthly": ["policy audit", "maturity score review", "cost architecture adjustment"],
    "quarterly": ["chaos drill", "governance reset", "roadmap reprioritization"]
  }
}
# 성숙도 스코어 계산 스크립트 예시
./ops/maturity/score.sh \
  --quality-gate-coverage 0.85 \
  --rollback-mttr-min 12 \
  --policy-audit-pass-rate 0.97 \
  --incident-repeat-rate 0.08

Operating points:

  • Do not separate indicator review and action tracking.
  • Recurring incident items are classified as “operational debt” rather than “technical debt.”
  • The maturity score is used to determine investment priorities, not to evaluate the team.

Pattern 3: Automating the learning loop of experimentation-deployment-retrospection

Even if there are many experiments, if learning is not accumulated, maturity stagnates. Failure samples and postmortem actions should be fed back to the evaluation set and release gate.

type IncidentLearning = {
  incidentId: string;
  rootCause: string;
  newEvalCases: string[];
  gateUpdates: string[];
  owner: string;
};

export async function applyLearning(input: IncidentLearning) {
  await evalDataset.appendCases(input.newEvalCases);
  await releaseGates.update(input.gateUpdates);
  await backlog.create({
    title: `Postmortem actions for ${input.incidentId}`,
    owner: input.owner,
  });
}

Operating points:- Manage incident action completion rate as quarterly KPI.

  • The evaluation set is not a correct answer data set, but is operated as a “failure recurrence prevention data set.”
  • Retrospectives should result in improvements in control, not in seeking responsibility.

Failure cases/anti-patterns

Failure scenario: “Team with tools but no structure”

Situation:

  • Observation tools, prompt management tools, and policy engines were all introduced.
  • However, due to the lack of role separation and approval system, high-risk changes were frequently distributed directly.
  • Similar incidents were repeated three times in two months, reducing customer trust.

Detection procedure:

  1. Check repetition of the same root cause pattern in the accident postmortem
  2. Measure the distribution rate of missing approval by analyzing change logs
  3. Confirm the upward trend in incident MTTR and repeat rate

Mitigation Procedures:

  1. Immediately enforce high-risk change approval workflows
  2. Add repeat incident item to weekly operation review
  3. Designate unfinished postmortem action as release blocker

Recovery Procedure:

  1. Role-based RACI override
  2. Specify step-by-step goals/indicators in the maturity roadmap
  3. Regular quarterly operational training (policy incidents, cost spikes, quality regressions)

Representative antipatterns

  • Maturity is judged by the number of tools introduced
  • Finish the incident report in writing and skip the execution trace.
  • Strategy to conduct experiments quickly but delay investment in rollback capabilities
  • A structure that does not give integrated priority even though organizational KPIs conflict

Checklist

  • Has the current team’s maturity level (L1 to L4) been explicitly defined?
  • Are there approval systems and exception procedures for high-risk changes?
  • Is RACI documented and working in actual incidents?
  • Is the operating rhythm (day/week/month/quarter) fixed and executed?
  • Is incident learning automatically fed back to evaluation set/gate/backlog?
  • Do you consider repeat incident rate and rollback MTTR as key indicators?
  • Are there decision-making rules when security/quality/cost priorities conflict?

Summary

The final competitiveness of LLM/Agent operation is not model selection but the operating system. Systems evolve reliably when organizations control change, learn from failure, and clearly separate responsibilities. Prompts may be a starting point, but long-term performance is determined by systems engineering and operational processes.## Next episode preview

This episode is the last of the series. In the next article, based on these 12 articles, we plan to organize the “90-day implementation plan” (simultaneous technology/organizational transition) for actual introduction teams into a separate guide. At each stage, what to fix first and what indicators to verify performance are focused on execution.

Series navigation

Comments