Part 11. Reference Architecture: End-to-End Operational Design
We present an LLM/Agent reference architecture that combines prompting, evaluation, reliability, cost, security, and observability into one operating system.
Series: Prompt Engineering이 아니라 System Engineering이 중요한 이유
총 12편 구성. 현재 11편을 보고 있습니다.
- 01Part 1. Prompt is an interface: Revisiting system boundaries and contracts
- 02Part 2. Quality comes from the evaluation loop, not from prompts
- 03Part 3. Reliability Design: Retry, Timeout, Fallback, Circuit Breaker
- 04Part 4. Cost Design: Cache, Batching, Routing, Token Budget
- 05Part 5. Security Design: Prompt Injection, Data Leak, Policy Guard
- 06Part 6. Observability Design: Trace, Span, Log Schema, Regression Detection
- 07Part 7. Context Engineering: RAG, Memory, Recency, Multi-Tenancy
- 08Part 8. Agent architecture: Planner/Executor, state machine, task queue
- 09Part 9. Productization: Failure UX, Human-in-the-loop, Operational Governance
- 10Part 10. Change Management: Prompt Changes vs System Changes, Experiments and Rollbacks
- 11Part 11. Reference Architecture: End-to-End Operational DesignCURRENT
- 12Part 12. Organization/Process: Operational Maturity Model and Roadmap
In the previous section, we covered quality, reliability, cost, security, observability, context, and agent operation, respectively. In an actual service, these elements move simultaneously. So creating operational stability requires an integrated architecture, not individual patterns. This part presents a reference structure for “operating LLM functions as a product.”
Based on version
- Node.js 20 LTS
- TypeScript 5.8.x
- Next.js 16.1.x
- OpenAI API (Responses API, based on 2026-03 document)
- PostgreSQL 15
- Redis 7
Raise a problem
The following phenomena frequently appear in systems that have been individually optimized but are overall unstable.
- As the quality team improves prompts, cost team metrics worsen.
- When the security team strengthens its policies, the product team conversion rate decreases.
- Observation data is separated, making it difficult to share the cause of change between teams.
- When responding to a failure, the control plane and execution plane are mixed, resulting in a delay in rollback.
Practical Example A: Adverse Cost Effects of Quality Improvement Deployment
The resolution rate increased with prompt improvement, but the cost soared due to the increased output length. Cost alarms were triggered late, and the quality team had no idea whether their changes were responsible. This is because the metrics are not connected to the same operational loop.
Practical example B: User abandonment after security enhancement
Injection defense was strengthened by raising the policy engine threshold, but false refusals increased. The product team recognized it as a UI issue, and the security team judged it to be a success, resulting in a decision-making conflict.
Key concepts
An integrated architecture must separate the “control plane” and “data plane.”
- Control plane: versioning, policy, routing, experimentation, deployment, rollback decisions
- Execution side: request processing, retrieval, inference, tool call, response generation| Plane | Main components | Core Responsibilities | Strategies in case of failure | | --- | --- | --- | --- | | Control Plane | Prompt Registry, Policy Engine, Router Config, Release Manager | Change Control, Version Deployment, Gate Evaluation | Quick partial rollback | | Data Plane | API Gateway, Orchestrator, Retriever, Model Runtime, Tool Runner | User request processing, state transition | degraded mode, queue backpressure | | Observability Plane | Trace, Logs, Quality Metrics, Cost Metrics | Cause analysis, regression detection | Automatic alert + distribution blocking | | Governance Plane | Approval Workflow, Audit Store, Runbook | Accountability Tracking, Incident Response | escalation + postmortem |
Practical pattern
Pattern 1: Request Lifecycle Contract Standardization
On the execution side, fixing request processing steps and contracts can reduce implementation variation across teams.
type RequestEnvelope = {
requestId: string;
tenantId: string;
userId: string;
input: string;
contextVersion: string;
promptVersion: string;
policyVersion: string;
routeVersion: string;
deadlineMs: number;
};
type ResponseEnvelope = {
requestId: string;
status: "ok" | "degraded" | "needs_review" | "blocked";
answer: string;
citations: string[];
qualitySignals: {
confidence: number;
resolved: boolean;
followUpRisk: "low" | "medium" | "high";
};
costSignals: {
inputTokens: number;
outputTokens: number;
usd: number;
};
};
export async function handle(envelope: RequestEnvelope): Promise<ResponseEnvelope> {
const context = await retrieveContext(envelope);
const plan = await orchestrate(envelope, context);
const guarded = await applyPolicies(plan, envelope.policyVersion);
const result = await execute(guarded);
return buildResponse(envelope.requestId, result);
}
Operating points:
- Simplify cause tracking by adding a version field to the request/response envelope.
- Quality/cost signals are included with the results to make subsequent analysis easier.
- Envelope changes enforce backward compatibility policy.
Pattern 2: Control plane-centric distribution gate
Deployments should be operated as control plane configuration promotions, not code merges.
{
"release_gate": {
"quality": {
"resolved_rate_min": 0.78,
"follow_up_ratio_max": 0.18
},
"security": {
"policy_violation_rate_max": 0.005,
"forbidden_tool_call": 0
},
"cost": {
"usd_per_resolved_request_max_increase": 0.1
},
"reliability": {
"p95_latency_max_increase": 0.15,
"retry_amplification_max": 1.25
}
}
}
# 통합 게이트 평가 후 승격/차단
./ops/release/evaluate-gates.sh --release rel_20260303_2200
./ops/release/promote-if-pass.sh --release rel_20260303_2200 --canary 25
Operating points:
- When a gate fails, priority is given to component-level rollback rather than overall rollback.
- Gate criteria are reevaluated every quarter but do not change frequently.
- “Emergency mode”, which allows operations teams to bypass gates, enforces audit logs.
Pattern 3: Document team interfaces and responsibility boundaries
Just as important as the technical architecture is the organizational interface.| team | Owned Component | 1st KPI | 2nd KPI | | --- | --- | --- | --- | | Platform | Orchestrator, Router, State Store | SLA, p95 latency | retry amplification | | Search/Data | Ingestion, Index, Retriever | retrieval hit ratio | index lag | | Security | Policy Engine, Audit | policy violation rate | false refusal rate | | Product | UX, HITL Flow | safe completion rate | review SLA |
Operating points:
- Specify priority criteria when KPIs conflict (e.g. security > safety > quality > cost).
- Designate a single incident commander when responding to an incident.
- Connect the postmortem action to the team-level backlog.
Failure cases/anti-patterns
Failure Scenario: “Control Plane Drift”
Situation:
- Prompt Registry points to v9, and routing settings were using the v8 standard token budget.
- In terms of execution, longer output caused budget excess and fallbacks increased rapidly.
- Quality/cost/delay deteriorated at the same time, but the initial response was late due to separate dashboards for each team.
Detection procedure:
version_mismatch_countAlarm detection- Comparison of release manifest and execution log versions
- Correlation analysis of fallback rate surge and budget overflow event
Mitigation Procedures:
- Force Control Plane version synchronization
- Partial rollback of routing policy to previous version
- Temporarily reduce traffic from high-cost domains
Recovery Procedure:
- Introducing atomic config release
- Add drift verification as a required gate before deployment
- Standardization of cross-plane dashboard (quality/cost/reliability integration)
Representative antipatterns
- Mixing settings in application code without separation of control plane/execution plane
- Sacrificing overall metrics through team-specific optimization
- Determination of distribution gate as a single indicator
- Action items after an incident do not lead to structural changes
Checklist- [ ] Are the responsibilities of the Control Plane and Data Plane separated?
- Are version/quality/cost signals included in the request envelope?
- Do integration gates (quality/security/cost/reliability) control deployment promotion?
- Is component-level rollback automated?
- Are team ownership boundaries and KPI priorities documented?
- version Do drift detection indicators lead to operational alerts?
- Are postmortem results reflected in the architecture improvement backlog?
Summary
The success or failure of an LLM operation depends on how the system is combined rather than on individual technology selection. The key to the integrated reference architecture is to separate the control plane and execution plane and connect observation signals to the same decision loop. Only by designing in this way can quality, cost, security, and reliability be managed without conflicting with each other.
Next episode preview
The last part presents the LLM operational maturity model from an organization/process perspective. It organizes team roles, decision-making structure, experimentation culture, and operating routines step by step and concludes with a practical roadmap that allows you to clearly determine “how far we have come and what is next.”
Reference link
- OpenAI Developer Docs - Responses API
- OpenTelemetry 공식 문서
- Martin Fowler - Circuit Breaker
- 블로그: LLM Agent Tool Guardrail 설계
Series navigation
- Previous post: Part 10. 변경 관리: 프롬프트 변경 vs 시스템 변경, 실험과 롤백
- Next post: Part 12. 조직/프로세스: 운영 성숙도 모델과 로드맵