Skip to content

Week 3: MCP Architecture and the Agentic Tool Ecosystem

Phase 1Week 3ElementaryLecture: 2026-03-17

Protocol Perspective

Understand the MCP lifecycle, JSON-RPC 2.0 message flow, and transport selection criteria, and explain why the handshake is the starting point of governance.

Architecture Perspective

Understand the Host-Client-Server topology, the three server primitives (Tools/Resources/Prompts), and the reverse flow of Client Features (Sampling/Roots/Elicitation).

Governance Perspective

Explain how to control agent tool access using TBAC/OBO/Triple Gate patterns, and distinguish the difference from RBAC.

Implementation Perspective

Implement a Tool+Resource+Prompt server with FastMCP, validate with MCP Inspector, and apply input validation and safe error returns.

The governance we designed in Week 2 is logical policy — a rule like “this agent cannot delete files.” But for a policy to actually work, physical isolation and a standard protocol are needed. This week we establish these two layers, with MCP as the central axis.

MCP is the USB-C of AI. Just as USB-C unified printers, monitors, and external drives into a single port, MCP connects filesystems, Git, databases, and APIs through a single protocol. It standardizes integration methods that used to differ by vendor, providing a common interface for agents to discover and call tools.

Let’s clearly separate the two layers:

  • MCP = Capability Bus — decides what the agent can touch. Standardizes tool access and creates a structure where governance gateways can intercept.
  • MIG = Compute Sandbox — physically isolates the act of touching. One student’s OOM cannot propagate to another student.

MCP’s status was decisively established in December 2025 when Anthropic donated MCP to the Linux Foundation AI & Data (AAIF). OpenAI, Google, and Microsoft joined, making it the de facto industry standard. In Week 4’s loop paradigm, when agents autonomously write code, MCP controls tool access, MIG isolates computing, and Week 2’s governance sets approval boundaries — these three layers must work together.


MIG provides physical isolation, but MCP determines what the agent can do. This section covers the core concepts of MIG, and from the next section we dive deep into MCP.

Imagine 30 students sharing the same GPU. If student A accidentally runs an infinite loop or triggers OOM (Out of Memory), students B and C using the same GPU are also affected. Traditional time-slicing alternates GPU time-sharing, but it does not separate memory or cache.

Time-slicing vs MIG comparison

The diagram shows the key point:

  • Time-slicing: Logical partitioning. All apps share L2 cache and memory through a Shared Bus. One app’s OOM can cascade to a system-wide failure.
  • MIG: Physical partitioning. Each partition has its own independent crossbar port, L2 cache, and memory controller. A failure in one partition ends there.

Cheju Halla University’s AI lab DGX H100 has 8 GPUs. With each GPU split into 7 slices in MIG mode:

8 GPUs × 7 slices = 56 independent instances

That’s enough to assign one MIG instance (1g.10gb) to each of the 30 students with 26 to spare.

MIG partitioning happens in two stages:

  1. GPU Instance (GI): A hardware partition bundling SM (Streaming Multiprocessors), memory controllers, and L2 cache. The unit of physical isolation.
  2. Compute Instance (CI): Further subdivides SM within a GI. CIs within the same GI share memory, but SM is independently allocated.

In an educational environment, 1 GI = 1 CI mapping is the simplest and safest.

H100 MIG hardware slice architecture
ProfileSM CountGPU MemoryMax InstancesBest Use
1g.10gb~16 SM10GB HBM37Lab work, small model inference
2g.20gb~32 SM20GB HBM33Mid-scale inference, fine-tuning
3g.40gb~48 SM40GB HBM32Large-scale inference, quantized LLM
4g.40gb~64 SM40GB HBM31High-performance research workloads
7g.80gb132 SM80GB HBM31Full GPU (no partitioning)

Hardware-level partitioning

ItemDetail
Isolation levelPhysical separation of SM, L2 cache, memory controllers
Fault isolationComplete — one partition’s OOM has no effect on others
QoS guaranteePredictable latency, consistent throughput
ReconfigurationRequires GPU reset (takes a few seconds)
Suitable forEducation, multi-tenant inference, security-sensitive environments
Terminal window
# Enable MIG mode (admin)
sudo nvidia-smi -i 0 -mig 1
# Create GPU instances — 7 x 1g.10gb profile (admin)
sudo nvidia-smi mig -i 0 -cgi 19,19,19,19,19,19,19 -C
# Check current MIG instances (students can run these)
nvidia-smi mig -lgip # List available profiles
nvidia-smi mig -lgi # List created GPU instances
nvidia-smi mig -lci # List created Compute instances
# Check MIG device UUIDs
nvidia-smi -L
# GPU 0: NVIDIA H100 80GB HBM3 (UUID: GPU-xxxx)
# MIG 1g.10gb Device 0: (UUID: MIG-xxxx)
# MIG 1g.10gb Device 1: (UUID: MIG-yyyy)
# ...
# Run on a specific MIG slice
CUDA_VISIBLE_DEVICES=MIG-GPU-xxxx/0/0 python train.py

Managing MIG instances manually would require individually assigning and reclaiming slices for 30 students. Kubernetes with the NVIDIA GPU Operator automates this process.

Kubernetes + GPU Operator MIG scheduling flow

The key is that the GPU Operator’s Device Plugin registers MIG slices as Extended Resources like nvidia.com/mig-1g.10gb in the Kubernetes API, and the scheduler matches and places them against Pod resource requests.

kubernetes/student-workspace-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: student-2024001-workspace
namespace: ai-systems
labels:
course: ai-systems-2026
role: student
spec:
containers:
- name: workspace
image: pytorch/pytorch:2.5-cuda12-cudnn9-devel
resources:
requests:
nvidia.com/mig-1g.10gb: "1"
memory: "8Gi"
cpu: "4"
limits:
nvidia.com/mig-1g.10gb: "1"
memory: "16Gi"
cpu: "8"
env:
- name: STUDENT_ID
value: "2024001"
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: api-keys
key: anthropic-key
volumeMounts:
- name: workspace
mountPath: /workspace
volumes:
- name: workspace
persistentVolumeClaim:
claimName: student-2024001-pvc
restartPolicy: Never

nvidia.com/mig-1g.10gb: "1" — this single line tells Kubernetes to “assign one MIG 1g.10gb slice to this Pod.”


As the agent ecosystem grows, connecting 3 AI clients (Claude, Copilot, LangChain) with 4 tools (Postgres, Slack, GitHub, Jira) requires 3 × 4 = 12 individual integrations. Every time a client or tool is added, all existing connections must be updated.

N×M vs MCP comparison

MCP reduces this N×M complexity to N+M. Each client only needs to implement the MCP protocol, and each tool only needs to implement an MCP server. Inspired by LSP (Language Server Protocol) standardizing editor-language integration, MCP standardizes AI-tool integration.

MCP’s components are clearly separated:

  1. Host = the brain: The application the user interacts with directly. Claude Desktop, Cursor, VS Code, etc. Governs security policy and decides which servers to connect to.
  2. Client = the connector: Operates inside the Host, a protocol client that connects 1:1 with a single MCP server. A Host can have multiple Clients enabling 1:N connections.
  3. Server = the capability provider: A lightweight program that exposes specific functionality via the MCP protocol. Filesystem, Git, databases, etc. Modular and reusable.
[User] ↔ [Host (Claude Desktop)]
├── [Client A] ←stdio→ [Server: filesystem]
├── [Client B] ←stdio→ [Server: git]
└── [Client C] ←HTTP→ [Server: database]

Lifecycle: The Handshake Is the Start of Governance

Section titled “Lifecycle: The Handshake Is the Start of Governance”

An MCP session is not a simple “connect → use → close.” The handshake itself is capability negotiation — a governance surface.

  1. Initialize: Client sends an initialize request declaring its protocol version and supported capabilities.
  2. Capability Negotiation: Server responds with its own capabilities. At this point both sides agree on “what is possible.”
  3. Initialized Notification: Client sends notifications/initialized, activating the session.
  4. Operation: Both sides exchange messages within the scope of agreed capabilities.
  5. Shutdown: Normal termination. Client calls close() or closes the transport.

Why this structure matters for governance: If a gateway intervenes at the initialize stage, it can control which capabilities the server is allowed to expose. If the gateway filters the tools capability, tool calls become impossible for that session.

MCP is built on the JSON-RPC 2.0 protocol:

// 1. Initialization request (Client → Server)
{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2025-11-25",
"capabilities": {
"tools": {},
"sampling": {}
},
"clientInfo": { "name": "claude-desktop", "version": "1.5.0" }
}
}
// 2. Initialization response (Server → Client)
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"protocolVersion": "2025-11-25",
"capabilities": {
"tools": { "listChanged": true },
"resources": { "subscribe": true }
},
"serverInfo": { "name": "mig-monitor", "version": "0.1.0" }
}
}
// 3. Tool list request (Client → Server)
{
"jsonrpc": "2.0",
"id": 2,
"method": "tools/list"
}
// 4. Tool call (Client → Server)
{
"jsonrpc": "2.0",
"id": 3,
"method": "tools/call",
"params": {
"name": "get_mig_status",
"arguments": {}
}
}
ItemstdioStreamable HTTP
Connection scopeLocal processRemote / multiple clients
Security boundaryOS process isolation (no network exposure)Origin validation, TLS, session management required
Communicationstdin/stdoutHTTP POST + Server-Sent Events
Governance applicationLimited (process-level control)Central routing, auth, TBAC applicable
Suitable forPersonal development, local toolsTeam/org, production, multi-tenant

Rule: Use stdio when OS/process isolation is the strongest boundary. Choose Streamable HTTP when central routing, governance, or authentication is needed.

MCP logical architecture — local (stdio) vs remote (SSE) environments

The diagram shows the topology difference between the two transport methods. In a local environment, the Client inside the Host runs the Server as a subprocess and communicates over stdio. In a remote environment, communication crosses a network boundary via SSE (Server-Sent Events), and a governance gateway can be placed at that boundary.


MCP Primitives Extended — Server + Client Features

Section titled “MCP Primitives Extended — Server + Client Features”

The functionality an MCP server can expose is classified into three primitives:

PrimitiveControlled byDescriptionExamples
ToolsModel invokesSimilar to function calls. Input schema defined; model calls autonomously. Most powerful and most dangerousget_mig_status(), run_query(sql)
ResourcesApplication controlsRead-only data exposure. Identified by URI. Supports list-changed notifications and subscriptionsmig://gpu/0/status, file:///workspace/config.yaml
PromptsUser selectsReusable prompt templates. User must explicitly select to activate”MIG Status Analysis”, “Security Review Checklist”

While the three server primitives represent capability exposure in the “server → client” direction, Client Features are the reverse flow where the server requests something from the client. This reverse hook elevates MCP from a simple tool-calling protocol to an agentic collaboration protocol.

Sampling — Server borrows the LLM’s intelligence

The server requests inference from the client’s LLM instead of computing locally. The server can leverage the host’s intelligence without needing its own model API key.

  • Server sends a sampling/createMessage request; client calls the LLM and returns the result.
  • HOTL principle: User approval is mandatory. Client retains authority over model selection, token limits, and request modification.
  • Week 2 connection: Sampling corresponds to HOTL control of server-initiated agentic behavior.

Roots — Declare the working scope

The server queries the client for URI/filesystem boundaries to work within.

  • Client provides Roots like file:///workspace/project-a; server should operate only within that scope.
  • Key warning: “Declared boundary ≠ actual boundary.” Roots are advisory only. Actual access control must be enforced at the OS/sandbox level.

Elicitation — Embeds Human-in-the-Loop into the protocol

The server requests confirmation directly from the user. Used to obtain consent before sensitive operations or to collect additional information.

  • Server sends elicitation/create; client presents UI to the user and returns the response.
  • Week 2 connection: Elicitation is the protocol implementation of Hard Interrupt. It standardizes at the protocol level the concept from Week 2 of “agent asks the human before a dangerous operation.”

Traditional REST APIs are ill-suited to AI agent environments in several ways:

  • Fixed endpoints: Client code must change when the API changes. No dynamic discovery.
  • Stateless by default: Each request is independent, requiring separate design for context persistence.
  • No tool discovery: OpenAPI/Swagger exists, but it’s not a mechanism for AI to automatically discover tools at runtime and interpret their schemas.

In real enterprise environments, REST and MCP coexist. Referencing IBM architecture patterns:

  • Server side (microservices/SDK): Repetitive business logic. Predictable workflows like order processing, payment, and inventory management work efficiently with existing REST APIs.
  • Client side (agent): High-level decision-making and orchestration. Understanding context and deciding which tools to call in what order.
  • MCP: A context-aware communication bridge connecting both sides. The agent dynamically discovers tools and calls them with maintained session context.
PerspectiveREST APIMCP
Communication patternRequest-response (stateless)Session-based (stateful)
Tool discoveryStatic (OpenAPI docs)Dynamic (tools/list runtime discovery)
ContextEach request independentContext persists within session
ExtensibilityClient changes needed when adding endpointsTools extend by adding servers only (N+M)
Type safetyOpenAPI schemaAuto-generated JSON Schema + validation
AI suitabilityPre-defined workflowsDynamic decision-making, agentic loops

Benefits of hybrid architecture: Context persistence, dynamic tool discovery, type safety, fault isolation. Activate agent workflows by layering MCP on top of existing REST infrastructure without discarding it.


Security and Governance — TBAC, OBO, Triple Gate

Section titled “Security and Governance — TBAC, OBO, Triple Gate”

When MCP standardizes tool access, we can apply the governance policies from Week 2 on top of that standardized path.

Three security principles the MCP spec explicitly states:

  1. User Consent and Control: Users hold final approval authority over all tool calls and data access. Host applications must provide clear consent UI to users.
  2. Data Privacy: Data isolation between servers. Data from one server must not leak to another without user consent.
  3. LLM Sampling Control: Sampling requests must go through user approval, and the client retains full control over model selection and context.

MCP architecture has three trust boundaries:

  1. User ↔ Host ↔ Client (trusted zone): The scope the user directly controls. Host decides which Servers to connect to.
  2. Client ↔ Server (untrusted boundary): Servers are external packages. They may contain malicious code. All Server responses must be validated.
  3. Server ↔ External Infrastructure: The boundary where a Server accesses databases, APIs, and filesystems. The Server’s permissions become the agent’s permissions.

Traefik Hub provides an MCP-dedicated gateway that performs authorization at the protocol level:

  • OAuth 2.1 integration: PKCE (Proof Key for Code Exchange) mandated. Prevents authorization code theft attacks.
  • JWT validation + forwardAuthorization: Gateway validates the JWT and forwards the Authorization header to the server. Servers don’t need to implement token validation logic themselves.
  • RFC 8414 metadata discovery: Automatically discovers authorization server endpoints. Provides standardized metadata at the .well-known/oauth-authorization-server path.

TBAC 3 Layers — Tasks → Tools → Transactions

Section titled “TBAC 3 Layers — Tasks → Tools → Transactions”

RBAC (Role-Based Access Control) and ABAC (Attribute-Based Access Control) focus on “who” is accessing. But in agent environments, “what task is being performed” matters more than “who.” Because agents act on behalf of users, permissions should vary by task even for the same user.

TBAC (Task-Based Access Control) solves this with 3 layers:

  1. Tasks (task definition): Define task units like “handle customer inquiry” or “perform code review.” Map permitted tools to each task.
  2. Tools (tool control): Set call conditions for each tool. A variable substitution engine supports mcp.* (MCP session info) + jwt.* (JWT claims) namespaces for dynamic policy evaluation.
  3. Transactions (audit): Record all tool calls and detect abnormal patterns. Provides post-incident audit and real-time alerts.

The “Accountability Breakdown” problem: When an agent calls an API on behalf of a user, only the service account appears in audit logs. It’s impossible to trace who requested what and why. The OBO (On-Behalf-Of) pattern resolves this — the MCP server operates with the delegated user/agent ID instead of the service account, so the original requester is recorded in audit logs.

A single gateway is insufficient to defend against the diverse attack surface of agents. Triple Gate is a three-layer defense architecture where each gate handles independent concerns:

  1. 1st gate — AI Gateway: Prompt injection detection, PII (Personally Identifiable Information) filtering. Inspects LLM inputs and outputs.
  2. 2nd gate — MCP Gateway: TBAC authorization, capability filtering, OBO token exchange. Controls access at the MCP protocol level.
  3. 3rd gate — API Gateway: Rate limiting, IP blocking, request size limits. Infrastructure-level protection.

Each gate maintains its defense even if the others fail independently.

Triple Gate pattern architecture for agent security

The diagram shows the Triple Gate flow. An AI agent’s request must pass through the 1st AI gateway (prompt security), 2nd MCP gateway (TBAC task/tool authorization), and 3rd API gateway (rate limiting) sequentially to reach the backend system.

Evolving the MCP Gateway pattern from Week 2 to reflect TBAC/OBO concepts:

# governed_gateway.py — MCP governance gateway (pseudocode)
from dataclasses import dataclass, field
from enum import Enum
class TrustLevel(str, Enum):
TRUSTED = "trusted" # Internally validated server
SANDBOXED = "sandboxed" # Runs with restricted permissions
UNTRUSTED = "untrusted" # To be blocked
@dataclass
class GovernedMCPGateway:
"""Sits between Client ↔ Server and inspects all messages."""
trust_registry: dict # Trust level per server
dlp_patterns: list # Sensitive data patterns (API keys, PII, etc.)
tbac_policies: dict # Task-Based Access Control policies
audit_log: list = field(default_factory=list)
def intercept_request(self, server_name: str, method: str,
params: dict, task_context: str = "") -> dict:
"""Inspect Client → Server requests (outbound)"""
trust = self.trust_registry.get(server_name, TrustLevel.UNTRUSTED)
if trust == TrustLevel.UNTRUSTED:
return {"blocked": True, "reason": f"Server '{server_name}' is unregistered"}
if method == "tools/call":
tool_name = params.get("name", "")
# TBAC: task context-based tool access control
if not self._check_tbac(task_context, server_name, tool_name):
return {"blocked": True,
"reason": f"Tool '{tool_name}' access denied in task '{task_context}'"}
# Record audit log
self.audit_log.append({
"server": server_name, "method": method,
"task": task_context, "allowed": True,
})
return {"blocked": False}
def intercept_response(self, server_name: str, response: dict) -> dict:
"""Inspect Server → Client responses (inbound)"""
content = str(response)
# DLP: detect sensitive data leakage
for pattern in self.dlp_patterns:
if pattern.search(content):
return {"blocked": True, "reason": "Response contains sensitive data"}
# Prompt injection detection
if self._detect_injection(content):
return {"blocked": True, "reason": "Suspected prompt injection"}
return {"blocked": False, "response": response}
def _check_tbac(self, task: str, server: str, tool: str) -> bool:
"""TBAC — task-unit tool access control"""
task_policy = self.tbac_policies.get(task, {})
allowed_tools = task_policy.get(server, [])
return tool in allowed_tools or "*" in allowed_tools
def _detect_injection(self, content: str) -> bool:
"""Detect hidden prompt injection patterns in responses"""
suspicious = ["CLAUDE.md", "AGENTS.md", "self-replicate",
"write_file", "ssh_key", "credentials"]
return any(s in content for s in suspicious)

A deep analysis of SANDWORM_MODE from the MCP perspective, following Week 2’s brief overview.

3 basic attack vectors (review):

  1. Typosquatting: Register under a name similar to a legitimate package. postmark-mcp vs postmark-official-mcp.
  2. Indirect prompt injection: Insert hidden instructions in MCP server responses. Induce the agent to write self-replicating code to CLAUDE.md or AGENTS.md.
  3. Credential exfiltration: In stdio transport, the server inherits environment variables from the Host process and sends them externally.

Multi-stage infiltration payload (advanced):

SANDWORM_MODE is not a simple one-time attack. It uses a two-stage time-bomb strategy:

  • Stage 1 (immediate): On installation, collect credentials from .ssh/id_rsa, .aws/credentials, and .env files and send them to a C2 (Command & Control) server.
  • 48–96 hour delay: Delay stage 2 to avoid immediate detection. During this period the package functions normally to avoid suspicion.
  • Stage 2 (worm propagation): After the delay, inject self-replicating code into project config files (CLAUDE.md, AGENTS.md, MCP configuration JSON). When this code is executed by another developer’s agent, the infection spreads.

Adaptive behavior: When CI/CD environments (GitHub Actions, Jenkins) are detected, the delay is cancelled and stage 2 executes immediately. CI/CD pipeline secrets are more valuable.

ItemWhat to VerifyDefense Method
Server originnpm/PyPI package authenticityVerify official registry, check hash, use org-scoped packages
Transport securityEnvironment variable exposure in stdioPass minimal environment to server, use separate secret manager for sensitive keys
Response validationPrompt injection in server responsesInbound inspection, response length limits, enforce structured output
Permission scopeServer’s access to external systemsTBAC policies, network isolation, read-only mounts
Tool descriptionsInjection in description fieldScan tool descriptions in AI gateway, pattern-based filtering
Session managementStateful session hijackingValidate Mcp-Session-Id, require TLS, session timeout

MCP is a stateful protocol. Because context accumulates within a session, load balancing requires special consideration.

Problem: Traditional round-robin load balancing can send each request to a different server instance. MCP session state is bound to a specific server instance, so if the instance changes mid-session, context is lost.

Solution — HRW (Highest Random Weight) algorithm: Hash the Mcp-Session-Id header to route sessions to the same server instance. It’s a form of sticky session, but with the advantage that when servers are added/removed, only the minimum number of sessions are redistributed.

Trade-off: Session affinity can cause load imbalance. If long-lived sessions concentrate on a specific instance, that instance becomes overloaded. Session timeouts and periodic rebalancing are needed.


FastMCP dramatically simplifies MCP server implementation. Decorators expose functions as tools/resources/prompts, and Python type hints auto-generate JSON Schema.

# mig_monitor_server.py — MIG monitoring tool server
import sys
from fastmcp import FastMCP
mcp = FastMCP("mig-monitor", description="MIG slice status monitoring")
@mcp.tool()
def get_mig_status() -> dict:
"""Returns GPU/memory utilization of the currently assigned MIG slice."""
import pynvml
try:
pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
memory = pynvml.nvmlDeviceGetMemoryInfo(handle)
util = pynvml.nvmlDeviceGetUtilizationRates(handle)
pynvml.nvmlShutdown()
return {
"memory_used_mb": memory.used // (1024 * 1024),
"memory_total_mb": memory.total // (1024 * 1024),
"gpu_utilization_pct": util.gpu,
}
except Exception as e:
# Log to stderr — prevent stdout pollution
print(f"[ERROR] {e}", file=sys.stderr)
return {"error": str(e)}
@mcp.tool()
def check_memory_pressure(threshold_pct: float = 80.0) -> dict:
"""Checks whether memory utilization exceeds a threshold.
Args:
threshold_pct: Warning threshold (default 80%). Range 0–100.
"""
# Input validation
if not (0.0 <= threshold_pct <= 100.0):
return {"error": "threshold_pct must be in range 0–100"}
status = get_mig_status()
if "error" in status:
return status
used_pct = (status["memory_used_mb"] / status["memory_total_mb"]) * 100
return {
"used_pct": round(used_pct, 1),
"threshold_pct": threshold_pct,
"alert": used_pct > threshold_pct,
}

MCP Inspector is the official tool for testing and debugging servers:

Terminal window
# Run MCP Inspector (specifying the server directly)
npx @modelcontextprotocol/inspector python mig_monitor_server.py
# Access http://localhost:6274 in browser
# → Test tools/list, tools/call, resources/list, etc. via UI

Configuration for connecting MCP servers in Claude Desktop or Claude Code:

{
"mcpServers": {
"mig-monitor": {
"command": "python",
"args": ["mig_monitor_server.py"],
"env": {
"CUDA_VISIBLE_DEVICES": "MIG-GPU-xxxx/0/0"
}
},
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/workspace"]
},
"git": {
"command": "uvx",
"args": ["mcp-server-git", "--repository", "."]
}
}
}

  1. Is LLM inference possible on 1g.10gb (10GB VRAM)? Calculate how large a model can fit with 4-bit quantization. (Hint: 7B model with 4-bit quantization ≈ 4GB)
  2. Among the three TBAC layers (Tasks → Tools → Transactions), which is most important for our course environment? Why is RBAC insufficient? (Hint: should a student have access to delete_file during a “code review” task?)
  3. Why can existing antivirus software not detect the semantic prompt injection embedded in McpInject’s tool descriptions? What is the fundamental difference between code signature-based detection and semantic attacks?
  4. Do MCP’s Sampling and Elicitation features correspond to HOTL or HITL from Week 2? Explain why Sampling without user approval is dangerous.
  5. Placing an MCP Gateway in front of all servers creates a performance bottleneck. How can it be mitigated through bypass policies per trust level, caching, or asynchronous inspection? What trade-offs does HRW session affinity have?

  1. Connect to DGX via SSH and check MIG

    Terminal window
    # SSH connection
    ssh [student-ID]@dgx.chu.ac.kr
    # Check MIG profiles
    nvidia-smi mig -lgip
    # Check allocated MIG instances
    nvidia-smi mig -lgi
    # Check device UUIDs
    nvidia-smi -L
    # Capture results (for assignment submission)
    nvidia-smi mig -lgi > ~/mig-status.txt
  2. Project structure and environment setup

    Terminal window
    mkdir -p lab-03-mcp && cd lab-03-mcp
    python -m venv .venv
    source .venv/bin/activate
    # Using uv (recommended)
    uv add "mcp[cli]" fastmcp pynvml
    # Using pip
    pip install fastmcp pynvml
    lab-03-mcp/
    ├── mig_monitor_server.py # MCP server (Tool + Resource + Prompt)
    ├── mcp_config.json # MCP configuration
    ├── governance.py # TBAC-based governance integration
    └── tests/
    └── test_server.py
  3. Implement the MIG monitoring MCP server

    mig_monitor_server.py
    import sys
    from fastmcp import FastMCP
    mcp = FastMCP(
    "mig-monitor",
    description="MIG slice status monitoring and management"
    )
    @mcp.tool()
    def get_mig_status() -> dict:
    """Returns GPU/memory utilization of the current MIG slice."""
    try:
    import pynvml
    pynvml.nvmlInit()
    handle = pynvml.nvmlDeviceGetHandleByIndex(0)
    memory = pynvml.nvmlDeviceGetMemoryInfo(handle)
    util = pynvml.nvmlDeviceGetUtilizationRates(handle)
    pynvml.nvmlShutdown()
    return {
    "memory_used_mb": memory.used // (1024 * 1024),
    "memory_total_mb": memory.total // (1024 * 1024),
    "gpu_utilization_pct": util.gpu,
    "status": "ok",
    }
    except Exception as e:
    print(f"[ERROR] {e}", file=sys.stderr)
    return {"status": "error", "message": str(e)}
    @mcp.tool()
    def check_memory_pressure(threshold_pct: float = 80.0) -> dict:
    """Checks whether memory utilization exceeds the threshold."""
    if not (0.0 <= threshold_pct <= 100.0):
    return {"status": "error",
    "message": "threshold_pct must be in range 0–100"}
    status = get_mig_status()
    if status["status"] == "error":
    return status
    used_pct = (status["memory_used_mb"] / status["memory_total_mb"]) * 100
    return {
    "used_pct": round(used_pct, 1),
    "threshold_pct": threshold_pct,
    "alert": used_pct > threshold_pct,
    }
    @mcp.resource("mig://gpu/0/status")
    def gpu_status_resource() -> str:
    """Returns the current status of GPU 0 as text."""
    status = get_mig_status()
    if status["status"] == "error":
    return f"Error: {status['message']}"
    return (
    f"Memory: {status['memory_used_mb']}MB / "
    f"{status['memory_total_mb']}MB\n"
    f"GPU Utilization: {status['gpu_utilization_pct']}%"
    )
    @mcp.prompt()
    def gpu_analysis_prompt() -> str:
    """Structured prompt template for analyzing GPU status."""
    return (
    "Please analyze the current MIG slice GPU status.\n\n"
    "Analysis items:\n"
    "1. Does memory utilization exceed the threshold (80%)?\n"
    "2. Is GPU utilization abnormally low or high?\n"
    "3. Is there a possibility of affecting other students' workloads?\n"
    "4. Recommendations for resource optimization"
    )
    if __name__ == "__main__":
    mcp.run()
  4. Write the MCP configuration JSON

    {
    "mcpServers": {
    "mig-monitor": {
    "command": "python",
    "args": ["mig_monitor_server.py"],
    "env": {
    "CUDA_VISIBLE_DEVICES": "MIG-GPU-xxxx/0/0"
    }
    }
    }
    }

    Replace MIG-GPU-xxxx/0/0 with the actual UUID confirmed in Step 1.

  5. Validate with MCP Inspector

    Terminal window
    # Run Inspector
    npx @modelcontextprotocol/inspector python mig_monitor_server.py
    # Access http://localhost:6274 in browser, then:
    # 1. tools/list → confirm get_mig_status, check_memory_pressure
    # 2. tools/call → run get_mig_status, check JSON response
    # 3. resources/list → confirm mig://gpu/0/status
    # 4. prompts/list → confirm gpu_analysis_prompt
    # 5. Take screenshot (for assignment submission)
  6. TBAC-based governance integration (task-unit tool access control)

    # governance.py — Task-unit MCP tool access control
    from enum import Enum
    class Role(str, Enum):
    STUDENT = "student"
    TA = "ta"
    ADMIN = "admin"
    # TBAC: Task × Role → Allowed Tools
    TBAC_POLICIES = {
    "monitoring": {
    # Monitoring task: all roles can use query tools
    Role.STUDENT: ["get_mig_status", "check_memory_pressure"],
    Role.TA: ["get_mig_status", "check_memory_pressure",
    "list_all_instances"],
    Role.ADMIN: ["*"],
    },
    "administration": {
    # Administration task: students have no access
    Role.STUDENT: [],
    Role.TA: ["list_all_instances", "get_instance_detail"],
    Role.ADMIN: ["*"],
    },
    "code_review": {
    # Code review task: only file reads allowed, no deletion/modification
    Role.STUDENT: ["read_file", "get_mig_status"],
    Role.TA: ["read_file", "get_mig_status", "run_linter"],
    Role.ADMIN: ["*"],
    },
    }
    def authorize_tool_call(role: Role, task: str, tool_name: str) -> bool:
    """TBAC: control tool access by role + task context"""
    task_policy = TBAC_POLICIES.get(task, {})
    allowed = task_policy.get(role, [])
    if "*" in allowed:
    return True
    return tool_name in allowed
    # Usage example
    if __name__ == "__main__":
    # Student queries status during monitoring task → allowed
    print(authorize_tool_call(Role.STUDENT, "monitoring",
    "get_mig_status")) # True
    # Student deletes instance during administration task → denied
    print(authorize_tool_call(Role.STUDENT, "administration",
    "delete_mig_instance")) # False
    # Student deletes file during code review → denied
    print(authorize_tool_call(Role.STUDENT, "code_review",
    "delete_file")) # False
    # Admin has access to all tools in all tasks
    print(authorize_tool_call(Role.ADMIN, "administration",
    "delete_mig_instance")) # True
  • Confirmed allocated MIG instances with nvidia-smi mig -lgi?
  • Does tools/list on the FastMCP server return 2 or more tools?
  • Does the get_mig_status call in MCP Inspector return valid JSON?
  • Does the mig://gpu/0/status resource show current memory usage?
  • Is gpu_analysis_prompt confirmed in prompts/list? (3 primitives complete)
  • Input validation: Does passing out-of-range values (e.g., -10, 200) to threshold_pct return a safe error?
  • Awareness that tool descriptions are untrusted — did you check for suspicious instructions in tool descriptions?
  • In TBAC-based access control, is a student calling tools from the administration task denied?
  • Are all error logs output to stderr? (preventing stdout pollution)

Lab 03: MCP Server Implementation and Security Validation

Section titled “Lab 03: MCP Server Implementation and Security Validation”

Due: 2026-03-24 23:59

Submission path: assignments/week-03/[student-ID]/

Required:

  1. MIG profile split analysis report — compare 1g.10gb × 7 vs 3g.40gb × 2 + 1g.10gb × 1 configurations in terms of SM count, memory, and max instances
  2. Write a Kubernetes YAML using nodeAffinity to schedule Pods only on nodes with a specific MIG profile
  3. Submit a screenshot or JSON dump of the tools/list JSON-RPC response
  4. Implement input validation + safe error returns in the FastMCP server — pynvml initialization failure, invalid GPU index, out-of-range parameters, etc. Include stdio stdout pollution prevention
  5. Write an architecture diagram including MCP trust boundaries + TBAC 3 layers — include the trust boundaries of User↔Host↔Client / Client↔Gateway↔Server / Server↔Infra, and the Tasks→Tools→Transactions flow

Bonus:

  1. Implement a FastAPI-based Governed MCP Gateway proxy — including inbound/outbound inspection
  2. Implement a resource template (mig://gpu/{id}/metrics) for real-time collection of GPU memory, temperature, and power with pynvml
  3. Load a Llama-3-8B 4-bit quantized model on a MIG 1g.10gb slice and measure inference benchmarks
  4. Traefik Hub MCP Gateway research → TBAC variable substitution policy design report: include dynamic policy examples using the mcp.* + jwt.* namespaces
  5. SANDWORM_MODE response — McpInject attack simulation: build an MCP server with malicious tool descriptions, confirm the tool descriptions in Inspector, and write a lab report identifying the prompt injection patterns
  1. MIG is hardware isolation: Unlike time-slicing, it physically separates SM, L2 cache, and memory controllers. One student’s OOM cannot propagate to another.
  2. DGX H100 × MIG = 56 independent instances: 8 GPUs × 7 slices can comfortably accommodate 30 students.
  3. MCP reduces N×M to N+M: Standardizes agent-tool connections so that adding a new tool only requires implementing an MCP server.
  4. MCP 3 primitives — distinguish Tools (model invokes), Resources (app controls), Prompts (user selects).
  5. Layer governance on top of MCP: A Governed Gateway applies TBAC policies and DLP inspection at the Client↔Server boundary.
  6. SANDWORM_MODE is MCP’s dark side: A combination of typosquatting + prompt injection + credential exfiltration. McpInject is a semantic attack exploiting the AI’s language comprehension.
  7. Client Features are reverse hooks: Sampling/Roots/Elicitation are control flows from server to client. HOTL implemented at the protocol level.
  8. TBAC supersedes RBAC: In agent environments, “what task” matters more than “who.” Three-layer access control with Tasks→Tools→Transactions.