4 min read

Part 11. Reference Architecture: End-to-End Operational Design

We present an LLM/Agent reference architecture that combines prompting, evaluation, reliability, cost, security, and observability into one operating system.

Series: Prompt Engineering이 아니라 System Engineering이 중요한 이유

12편 구성. 현재 11편을 보고 있습니다.

In the previous section, we covered quality, reliability, cost, security, observability, context, and agent operation, respectively. In an actual service, these elements move simultaneously. So creating operational stability requires an integrated architecture, not individual patterns. This part presents a reference structure for “operating LLM functions as a product.”

Based on version

  • Node.js 20 LTS
  • TypeScript 5.8.x
  • Next.js 16.1.x
  • OpenAI API (Responses API, based on 2026-03 document)
  • PostgreSQL 15
  • Redis 7

Raise a problem

The following phenomena frequently appear in systems that have been individually optimized but are overall unstable.

  • As the quality team improves prompts, cost team metrics worsen.
  • When the security team strengthens its policies, the product team conversion rate decreases.
  • Observation data is separated, making it difficult to share the cause of change between teams.
  • When responding to a failure, the control plane and execution plane are mixed, resulting in a delay in rollback.

Practical Example A: Adverse Cost Effects of Quality Improvement Deployment

The resolution rate increased with prompt improvement, but the cost soared due to the increased output length. Cost alarms were triggered late, and the quality team had no idea whether their changes were responsible. This is because the metrics are not connected to the same operational loop.

Practical example B: User abandonment after security enhancement

Injection defense was strengthened by raising the policy engine threshold, but false refusals increased. The product team recognized it as a UI issue, and the security team judged it to be a success, resulting in a decision-making conflict.

Key concepts

An integrated architecture must separate the “control plane” and “data plane.”

  • Control plane: versioning, policy, routing, experimentation, deployment, rollback decisions
  • Execution side: request processing, retrieval, inference, tool call, response generation| Plane | Main components | Core Responsibilities | Strategies in case of failure | | --- | --- | --- | --- | | Control Plane | Prompt Registry, Policy Engine, Router Config, Release Manager | Change Control, Version Deployment, Gate Evaluation | Quick partial rollback | | Data Plane | API Gateway, Orchestrator, Retriever, Model Runtime, Tool Runner | User request processing, state transition | degraded mode, queue backpressure | | Observability Plane | Trace, Logs, Quality Metrics, Cost Metrics | Cause analysis, regression detection | Automatic alert + distribution blocking | | Governance Plane | Approval Workflow, Audit Store, Runbook | Accountability Tracking, Incident Response | escalation + postmortem |
Mermaid diagram rendering...

Practical pattern

Pattern 1: Request Lifecycle Contract Standardization

On the execution side, fixing request processing steps and contracts can reduce implementation variation across teams.

type RequestEnvelope = {
  requestId: string;
  tenantId: string;
  userId: string;
  input: string;
  contextVersion: string;
  promptVersion: string;
  policyVersion: string;
  routeVersion: string;
  deadlineMs: number;
};

type ResponseEnvelope = {
  requestId: string;
  status: "ok" | "degraded" | "needs_review" | "blocked";
  answer: string;
  citations: string[];
  qualitySignals: {
    confidence: number;
    resolved: boolean;
    followUpRisk: "low" | "medium" | "high";
  };
  costSignals: {
    inputTokens: number;
    outputTokens: number;
    usd: number;
  };
};

export async function handle(envelope: RequestEnvelope): Promise<ResponseEnvelope> {
  const context = await retrieveContext(envelope);
  const plan = await orchestrate(envelope, context);
  const guarded = await applyPolicies(plan, envelope.policyVersion);
  const result = await execute(guarded);
  return buildResponse(envelope.requestId, result);
}

Operating points:

  • Simplify cause tracking by adding a version field to the request/response envelope.
  • Quality/cost signals are included with the results to make subsequent analysis easier.
  • Envelope changes enforce backward compatibility policy.

Pattern 2: Control plane-centric distribution gate

Deployments should be operated as control plane configuration promotions, not code merges.

{
  "release_gate": {
    "quality": {
      "resolved_rate_min": 0.78,
      "follow_up_ratio_max": 0.18
    },
    "security": {
      "policy_violation_rate_max": 0.005,
      "forbidden_tool_call": 0
    },
    "cost": {
      "usd_per_resolved_request_max_increase": 0.1
    },
    "reliability": {
      "p95_latency_max_increase": 0.15,
      "retry_amplification_max": 1.25
    }
  }
}
# 통합 게이트 평가 후 승격/차단
./ops/release/evaluate-gates.sh --release rel_20260303_2200
./ops/release/promote-if-pass.sh --release rel_20260303_2200 --canary 25

Operating points:

  • When a gate fails, priority is given to component-level rollback rather than overall rollback.
  • Gate criteria are reevaluated every quarter but do not change frequently.
  • “Emergency mode”, which allows operations teams to bypass gates, enforces audit logs.

Pattern 3: Document team interfaces and responsibility boundaries

Just as important as the technical architecture is the organizational interface.| team | Owned Component | 1st KPI | 2nd KPI | | --- | --- | --- | --- | | Platform | Orchestrator, Router, State Store | SLA, p95 latency | retry amplification | | Search/Data | Ingestion, Index, Retriever | retrieval hit ratio | index lag | | Security | Policy Engine, Audit | policy violation rate | false refusal rate | | Product | UX, HITL Flow | safe completion rate | review SLA |

Operating points:

  • Specify priority criteria when KPIs conflict (e.g. security > safety > quality > cost).
  • Designate a single incident commander when responding to an incident.
  • Connect the postmortem action to the team-level backlog.

Failure cases/anti-patterns

Failure Scenario: “Control Plane Drift”

Situation:

  • Prompt Registry points to v9, and routing settings were using the v8 standard token budget.
  • In terms of execution, longer output caused budget excess and fallbacks increased rapidly.
  • Quality/cost/delay deteriorated at the same time, but the initial response was late due to separate dashboards for each team.

Detection procedure:

  1. version_mismatch_count Alarm detection
  2. Comparison of release manifest and execution log versions
  3. Correlation analysis of fallback rate surge and budget overflow event

Mitigation Procedures:

  1. Force Control Plane version synchronization
  2. Partial rollback of routing policy to previous version
  3. Temporarily reduce traffic from high-cost domains

Recovery Procedure:

  1. Introducing atomic config release
  2. Add drift verification as a required gate before deployment
  3. Standardization of cross-plane dashboard (quality/cost/reliability integration)

Representative antipatterns

  • Mixing settings in application code without separation of control plane/execution plane
  • Sacrificing overall metrics through team-specific optimization
  • Determination of distribution gate as a single indicator
  • Action items after an incident do not lead to structural changes

Checklist- [ ] Are the responsibilities of the Control Plane and Data Plane separated?

  • Are version/quality/cost signals included in the request envelope?
  • Do integration gates (quality/security/cost/reliability) control deployment promotion?
  • Is component-level rollback automated?
  • Are team ownership boundaries and KPI priorities documented?
  • version Do drift detection indicators lead to operational alerts?
  • Are postmortem results reflected in the architecture improvement backlog?

Summary

The success or failure of an LLM operation depends on how the system is combined rather than on individual technology selection. The key to the integrated reference architecture is to separate the control plane and execution plane and connect observation signals to the same decision loop. Only by designing in this way can quality, cost, security, and reliability be managed without conflicting with each other.

Next episode preview

The last part presents the LLM operational maturity model from an organization/process perspective. It organizes team roles, decision-making structure, experimentation culture, and operating routines step by step and concludes with a practical roadmap that allows you to clearly determine “how far we have come and what is next.”

Series navigation

Comments