Permissions: what the pod can do

Clu runs under two stacked identity systems. Both matter. Neither replaces the other.

Layer	Identity	Grants access to
AWS IAM	IRSA role (via OIDC federation) or EKS Pod Identity	AWS APIs — Bedrock, IAM, RDS, S3, CloudWatch, Secrets Manager, EC2, Cost Explorer
Kubernetes RBAC	ServiceAccount `clu-ops:clu-ops-agent`	K8s API — Pods, Deployments, Services, ConfigMaps, etc.

A bedrock:InvokeModel call inside the pod uses the IAM identity. A kubectl get pods-equivalent call uses the K8s identity. They're independent — denying one doesn't deny the other.

This doc is the full map of what each layer grants.

Layer 1 — AWS IAM

The pod gets AWS credentials through one of two mechanisms:

IRSA (IAM Roles for Service Accounts)

Default on EKS 1.19+. The pod's ServiceAccount carries an eks.amazonaws.com/role-arn annotation; the Amazon EKS Pod Identity webhook mutates the pod spec at admission time to:

Project an OIDC token into the pod at /var/run/secrets/eks.amazonaws.com/serviceaccount/token.
Set AWS_WEB_IDENTITY_TOKEN_FILE + AWS_ROLE_ARN env vars.

When the pod's boto3 client makes its first AWS call, it reads the env vars + token, calls sts:AssumeRoleWithWebIdentity, and gets temporary credentials scoped to the named role. Credentials auto- rotate on a ~1-hour cycle.

Trust-policy condition: the role's trust policy must match system:serviceaccount:<namespace>:<sa-name>. For Clu that's system:serviceaccount:clu-ops:clu-ops-agent. The trust-policy template is in IAM setup.

EKS Pod Identity (EKS 1.27+)

The newer alternative. Requires the eks-pod-identity-agent addon installed on the cluster. Replaces OIDC federation with a direct EKS → IAM binding managed by EKS itself — no OIDC provider to configure, no AssumeRoleWithWebIdentity round-trip.

Clu works with either. Pod Identity is simpler to set up if you're starting fresh; IRSA is the de-facto standard if you already have other workloads using it.

What the IRSA role grants (per capability)

The full inline JSON for each policy lives in IAM setup. One paste-ready document per capability tier you've enabled in Helm. This page summarizes the intent; the actual policy text is canonical there.

Core (always attached):

Service	Actions	Why
Bedrock	`InvokeModel`, `InvokeModelWithResponseStream`, `Converse`, `ConverseStream`	LLM inference for chat + scheduled reports
CloudWatch	`GetMetricData`, `GetMetricStatistics`, `ListMetrics`, `DescribeAlarms`, `logs:StartQuery`, `logs:GetQueryResults`, `logs:DescribeLog*`	Metrics + logs context on health-rule findings
AWS Marketplace	`RegisterUsage`	Entitlement metering at startup (bypassed with JWT license)

Resources are scoped: Bedrock is narrowed to specific model-family ARN patterns (Anthropic Claude, Llama, gpt-oss, Mistral); CloudWatch

Logs are cluster-wide (read-only); the Marketplace action takes no resource scoping.

Cloud (added when modules.cloud.enabled=true):

Service	Actions	Tool using it
IAM	`ListRoles`, `GetRole`, `ListAttachedRolePolicies`	`aws_iam_roles`, `aws_irsa_mapping`
RDS	`DescribeDBInstances`, `DescribeDBClusters`	`aws_rds`
ElastiCache	`DescribeCacheClusters`, `DescribeReplicationGroups`	`aws_elasticache`
S3	`ListAllMyBuckets`, `GetBucketLocation`, `GetBucketTagging`, `GetBucketPolicy`, `GetBucketEncryption`	`aws_s3` (never reads bucket content)
Secrets Manager	`ListSecrets`, `DescribeSecret`	`aws_secrets` (never reads secret values)
ECR	`DescribeRepositories`, `DescribeImages`, `ListImages`	`aws_ecr`
EC2	`DescribeVpcs`, `DescribeSubnets`, `DescribeSecurityGroups`, `DescribeRouteTables`, `DescribeNatGateways`, `DescribeInternetGateways`, `DescribeInstances`, `DescribeVolumes`	`aws_vpc_networking`, savings detectors
ELB	`DescribeLoadBalancers`, `DescribeTargetGroups`, `DescribeTargetHealth`	savings detectors (idle ALBs/NLBs)
Cost Explorer	`GetCostAndUsage`, `GetCostForecast`, `GetTags`, `GetDimensionValues`	`aws_cost_summary`

Explicit omissions that matter:

No GetSecretValue — Clu never reads secret content. The aws_secrets tool surfaces names + ARNs so the agent can reason about secret existence, not contents.
No S3 GetObject — Clu lists buckets but doesn't read objects.
No EC2 write actions — describe-only, never Run*, Terminate*, Create*.

Core Plus (added when modules.corePlus.enabled=true): identical to Cloud. Writes happen K8s-side (through the chart's writer ClusterRole below), not cloud-side. The IDP policy is attached separately so operators can toggle the Core Plus independently without re-attaching the Cloud's JSON.

Layer 2 — Kubernetes RBAC

The pod's ServiceAccount is clu-ops:clu-ops-agent. Every K8s API call the pod makes authenticates as this SA. Three RBAC bindings govern what it can do.

Cluster-wide reader (always installed)

Granted by the chart at install time. Lets the agent observe every resource it needs to reason about cluster shape, without granting any write or admin privilege.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: clu-ops-agent-reader
rules:
  # Core resources
  - apiGroups: [""]
    resources:
      - pods
      - pods/log
      - services
      - endpoints
      - configmaps
      - namespaces
      - nodes
      - persistentvolumeclaims
      - persistentvolumes
      - replicationcontrollers
      - serviceaccounts
      - resourcequotas
      - limitranges
      - events
    verbs: ["get", "list", "watch"]
  # Secrets — narrower scope than the other core resources because
  # content reads reveal application data. We list them so Helm 3
  # release discovery works (Helm stores each release-revision as a
  # Secret labelled ``owner=helm``), and we grant ``get`` so
  # ``helm_values`` can decode the release payload. The writer
  # ClusterRole explicitly does NOT grant any verb on Secrets; write
  # paths never touch them.
  - apiGroups: [""]
    resources: [secrets]
    verbs: ["get", "list", "watch"]
  # Workloads
  - apiGroups: ["apps"]
    resources:
      - deployments
      - replicasets
      - statefulsets
      - daemonsets
    verbs: ["get", "list", "watch"]
  - apiGroups: ["batch"]
    resources: [jobs, cronjobs]
    verbs: ["get", "list", "watch"]
  # Networking
  - apiGroups: ["networking.k8s.io"]
    resources: [networkpolicies, ingresses, ingressclasses]
    verbs: ["get", "list", "watch"]
  # RBAC (read-only — needed for IAM/RBAC analysis)
  - apiGroups: ["rbac.authorization.k8s.io"]
    resources: [roles, rolebindings, clusterroles, clusterrolebindings]
    verbs: ["get", "list", "watch"]
  # Policy
  - apiGroups: ["policy"]
    resources: [poddisruptionbudgets]
    verbs: ["get", "list", "watch"]
  # Storage
  - apiGroups: ["storage.k8s.io"]
    resources: [storageclasses, volumeattachments, csinodes, csidrivers]
    verbs: ["get", "list", "watch"]
  # Autoscaling
  - apiGroups: ["autoscaling"]
    resources: [horizontalpodautoscalers]
    verbs: ["get", "list", "watch"]
  # Metrics (when metrics-server present)
  - apiGroups: ["metrics.k8s.io"]
    resources: [pods, nodes]
    verbs: ["get", "list"]
  # API discovery
  - apiGroups: ["apiextensions.k8s.io"]
    resources: [customresourcedefinitions]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["admissionregistration.k8s.io"]
    resources: [validatingwebhookconfigurations, mutatingwebhookconfigurations]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["apiregistration.k8s.io"]
    resources: [apiservices]
    verbs: ["get", "list", "watch"]

Notable inclusions:

Secrets get/list/watch is required because Helm 3 stores release manifests as labeled Secrets. Without it helm_list returns silently empty. The reader ClusterRole grants the verbs but not content-read privilege at the K8s-write side — see the explicit non-grants below.

Notable omissions (security-relevant):

No create, update, patch, delete on anything. Writes live exclusively in the writer ClusterRole.
No impersonation (users, groups, serviceaccounts in authentication.k8s.io).
No tokenreview / subjectaccessreview — the pod can't introspect arbitrary identities.
No CRD discovery beyond the registered CRD shapes the scanner walks (apiextensions.k8s.io is limited to list, for CRDs specifically — not for arbitrary resources).

Cluster-wide writer (installed when the Core Plus is active)

Granted by the chart only when modules.corePlus.writeOperations.enabled=true in Helm values. Every action under this role flows through the in- product approval gate before a real apply.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: clu-ops-agent-writer
rules:
  # Workload writes — no delete, no cluster-admin.
  - apiGroups: ["apps"]
    resources: [deployments, statefulsets, daemonsets]
    verbs: ["create", "update", "patch"]
  - apiGroups: ["batch"]
    resources: [jobs, cronjobs]
    verbs: ["create", "update", "patch"]
  - apiGroups: [""]
    resources: [services, configmaps, serviceaccounts, namespaces]
    verbs: ["create", "update", "patch"]
  - apiGroups: ["networking.k8s.io"]
    resources: [ingresses, networkpolicies]
    verbs: ["create", "update", "patch"]
  - apiGroups: ["autoscaling"]
    resources: [horizontalpodautoscalers]
    verbs: ["create", "update", "patch"]
  # Subresources for scale + restart (rollouts).
  - apiGroups: ["apps"]
    resources: [deployments/scale, statefulsets/scale]
    verbs: ["update", "patch"]
  # Cordon (a node patch) — narrowly scoped.
  - apiGroups: [""]
    resources: [nodes]
    verbs: ["patch"]

Critical non-grants (invariants, not configurable):

No delete verb anywhere. The Core Plus doesn't delete. Cleanup is an operator responsibility via kubectl, helm, or GitOps.
No cluster-admin. No wildcard * rules, no */scale subresource on unexpected kinds, no secrets/* beyond the reader grant.
No secret content reads from the writer. k8s_apply on a manifest that contains a Secret would require create/update/patch on secrets — which the writer does NOT include. Customers who need to ship Secret content must do so via ExternalSecrets or another out-of-band path.

Namespaced state Role (always installed)

Scoped to the pod's own namespace. Governs Clu's self-state — the four ConfigMaps that persist operator-visible state (reports, approvals, snoozes, audit) plus the scan cache.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: clu-ops-agent-state
  namespace: clu-ops
rules:
  # ``create`` cannot be scoped by ``resourceNames`` — applies to
  # any ConfigMap in the namespace, but the agent only creates the
  # five named below.
  - apiGroups: [""]
    resources: [configmaps]
    verbs: [create]
  # Read + update on the specific store ConfigMaps the agent owns.
  - apiGroups: [""]
    resources: [configmaps]
    resourceNames:
      - clu-ops-reports
      - clu-ops-approvals
      - clu-ops-snoozes
      - clu-ops-audit
      - clu-ops-scan
    verbs: [get, update, patch]
  # No ``delete`` — store-side retention caps handle cleanup, not
  # K8s-side TTL.

This is a Role (namespaced), not a ClusterRole. Blast radius is the pod's own namespace only. If this Role is missing, the factory falls back to InMemory for those stores + logs a WARN on startup — the agent works but state vanishes on restart.

Protected namespaces

Hard-coded in the backend (not Helm-configurable):

kube-system
kube-public
kube-node-lease
clu-ops

Every write tool calls the backend's WriteSafetyGate which rejects any write targeting these namespaces, regardless of RBAC. Even if an operator attached cluster-admin to the writer ClusterRole, writes to these namespaces still fail at the gate.

Layer 3 (optional) — the human operator

When you kubectl get pods against an EKS cluster, the cluster needs to authorize your human IAM identity, not the pod's. EKS resolves this via either:

Access entries (modern, recommended) — IAM role ARN → K8s group or access-policy mapping, stored in EKS control plane.
aws-auth ConfigMap (legacy) — same mapping via a ConfigMap in kube-system that EKS watches.

For first install, the AmazonEKSClusterAdminPolicy access policy is the simplest mapping:

your IAM role  ─►  access entry  ─►  AmazonEKSClusterAdminPolicy  ─►  cluster-admin

cluster-admin is the built-in ClusterRole with * on *. You need that to bootstrap the cluster + install Clu; day-to-day access should be narrower (namespace admin, dev-namespace viewer, etc.).

Your human IAM permissions (what you can do in the AWS console + CLI) are independent from both the pod's IRSA role AND the access entry. AmazonEKSClusterAdminPolicy is a K8s-side policy mapped through EKS; your IAM policy is whatever your SSO role grants you at the AWS API layer.

How the three layers compose in practice

You ask "why can't Clu list my Secrets?" and the answer depends on which layer is denying:

Clu's SA missing RBAC → chat shows error_kind: permission_denied ... list on Secret ... forbidden by RBAC, paste-ready fix targets the reader ClusterRole above.
Clu's IRSA role missing IAM → same shape, error_kind: permission_denied ... aws:secretsmanager, paste-ready fix targets the Cloud JSON in IAM setup.
Your kubectl user missing access → EKS returns 401/403 on your kubectl call; Clu never sees the request. Check your access entry: aws eks list-access-entries --cluster-name <name>.

The three are distinguishable because Clu's chat surfaces layer 1 and layer 2 via the structured error taxonomy, and layer 3 denies show up in your terminal (not in Clu) before the request even reaches the pod.

How to audit what's granted right now

# Layer 1 — IAM role attached policies
role_arn=$(kubectl get sa -n clu-ops clu-ops-agent \
  -o jsonpath='{.metadata.annotations.eks\.amazonaws\.com/role-arn}')
role_name=${role_arn##*/}
aws iam list-role-policies --role-name "$role_name"   # inline policies
aws iam list-attached-role-policies --role-name "$role_name"   # managed

# Layer 2 — RBAC on the pod's SA
kubectl auth can-i --as=system:serviceaccount:clu-ops:clu-ops-agent --list
# Or for a specific verb:
kubectl auth can-i --as=system:serviceaccount:clu-ops:clu-ops-agent \
  list secrets -n default

# Layer 3 — your own access
kubectl auth can-i --list          # what your current kubeconfig user can do
aws eks list-access-entries --cluster-name <name> --region <region>
aws eks list-associated-access-policies --cluster-name <name> \
  --principal-arn <your-role-arn> --region <region>

The first two commands are what Clu itself runs internally (or would, if self-introspection of RBAC becomes a tool — currently it isn't; the errors come from the actual calls failing).