#system-design

12 posts found.

llm

2026년 3월 3일4 min read

Part 1. Prompt is an interface: Revisiting system boundaries and contracts

From a system perspective, we summarize that the essence of the LLM function is not the prompt statement itself but the boundary, contract, state, and failure handling.

llm agent system-design prompt-engineering platform-engineering reliability

Read article

llm

2026년 3월 3일5 min read

Part 2. Quality comes from the evaluation loop, not from prompts

LLM quality is stabilized when managed through datasets, evaluation criteria, online feedback, and regression detection loops, not sentence tuning.

llm agent system-design evaluation observability quality-engineering

Read article

llm

2026년 3월 3일4 min read

Part 3. Reliability Design: Retry, Timeout, Fallback, Circuit Breaker

We summarize the reasons and operating patterns for retries, timeouts, fallbacks, and circuit breakers in LLM systems that should be designed differently from regular APIs.

llm agent system-design reliability sre resilience

Read article

llm

2026년 3월 3일4 min read

Part 4. Cost Design: Cache, Batching, Routing, Token Budget

LLM costs are determined by the system control method, not the model unit price. Organize cache, batching, routing, and token budget from an operational perspective.

llm agent system-design cost-optimization finops platform-engineering

Read article

llm

2026년 3월 3일4 min read

Part 5. Security Design: Prompt Injection, Data Leak, Policy Guard

LLM security is not solved by prompt defense alone. Covers system design that combines permission policies, data boundaries, and tool sandboxing.

llm agent system-design security policy-engine sandbox

Read article

llm

2026년 3월 3일4 min read

Part 6. Observability Design: Trace, Span, Log Schema, Regression Detection

To catch quality degradation without failure in LLM operation, trace, log, and quality indicators must be designed as a single observation system.

llm agent system-design observability trace quality-regression

Read article

llm

2026년 3월 3일4 min read

Part 7. Context Engineering: RAG, Memory, Recency, Multi-Tenancy

LLM quality is more sensitive to the context path than the model. We summarize how to design RAG, memory, freshness, and tenant boundaries from a system perspective.

llm agent system-design rag memory multitenancy

Read article

llm

2026년 3월 3일4 min read

Part 8. Agent architecture: Planner/Executor, state machine, task queue

Operable agents are state-based systems, not chains. Planner/Executor separation, queues, guardrails, and recovery strategies are organized from a practical perspective.

llm agent system-design workflow state-machine queue

Read article

llm

2026년 3월 3일5 min read

Part 9. Productization: Failure UX, Human-in-the-loop, Operational Governance

It covers failure UX, human intervention, and operational governance design to make a technically functional LLM function into a trustworthy product for users.

llm agent system-design productization ux governance

Read article

llm

2026년 3월 3일4 min read

Part 10. Change Management: Prompt Changes vs System Changes, Experiments and Rollbacks

We summarize why different deployment gates are needed for each type of change in LLM operations, as well as experiment, canary, and rollback strategies.

llm agent system-design change-management canary rollback

Read article

llm

2026년 3월 3일4 min read

Part 11. Reference Architecture: End-to-End Operational Design

We present an LLM/Agent reference architecture that combines prompting, evaluation, reliability, cost, security, and observability into one operating system.

llm agent system-design reference-architecture platform sre

Read article

llm

2026년 3월 3일5 min read

Part 12. Organization/Process: Operational Maturity Model and Roadmap

We present an organizational structure, role separation, decision-making system, and maturity roadmap to operate the LLM/Agent system sustainably.

llm agent system-design operating-model governance maturity

Read article