Blogs | KubeArmor

Network Segmentation of Linux VMs using KubeArmor

April 14, 2026 · 9 min read

DevRel @ KubeArmor | Accuknox

alt text

KubeArmor now enforces layer 3/4 network rules on Linux VMs via a new CRD: KubeArmorNetworkPolicy. Policies support CIDR ranges, port ranges, interface scoping, and both ingress and egress control. Enforcement runs at the kernel via nftables and is stateful by default.

Introducing USB Device Enforcement for Host Security

October 23, 2025 · 6 min read

Atharva Shah

DevRel @ KubeArmor | Accuknox

We're excited to announce a powerful new security capability in KubeArmor: USB Device Audit and Enforcement, now available in KubeArmor Host Policies. This feature, introduced in PR #2194 and tracking Issue #2165, significantly extends KubeArmor's runtime security scope, moving beyond processes, files, and networks to secure the physical hardware layer of your nodes and VMs.

Want to see it in action? Check out this demo video:

How USBs are a Physical Attack Vector

In any secure environment, physical access is a critical threat vector. USB devices, while ubiquitous, introduce substantial risks:

Data Exfiltration: Unauthorized USB mass storage devices can be used to easily copy and steal sensitive data.
Malicious Peripherals: Devices posing as keyboards (like "BadUSBs") can execute key-logging attacks or inject malicious commands.
Firmware-Based Attacks: Sophisticated attacks can target the device firmware itself.

Controlling which devices can be attached to a host is a critical part of a defense-in-depth strategy and a common requirement for regulatory compliance.

The Solution - The KubeArmor USB Device Handler

The new USB Device Handler gives you granular, policy-based control over USB devices at the host level. You can now create KubeArmor Host Policies to audit (log) or block (deauthorize) specific devices or entire classes of devices based on their attributes.

alt text

How It Works Under the Hood

To provide robust, low-level control, the USB Device Handler operates by directly interacting with the Linux kernel:

Monitoring Kernel U-events: The handler listens to kernel U-events via a Netlink socket. The kernel emits these events whenever a USB device is attached or removed.
Device Enumeration: When a device is attached, the kernel enumerates it, reading its descriptors to identify its class, subClass, protocol, and other attributes. This information is included in the U-event.
Policy Matching: The USB Device Handler receives this U-event and matches the device's attributes against all applied KubeArmorHostPolicy resources.
sysfs Enforcement: Based on the policy's action (Audit or Block), the handler enforces control using sysfs-based authorization.
- It writes a 1 (authorize) or 0 (deauthorize) to the device's authorized file within the sysfs pseudo-file system (e.g., /sys/bus/usb/devices/.../authorized).
- Writing 0 instantly deauthorizes the device, unbinding its drivers and making it unusable by the system, effectively blocking it.

This mechanism ensures that even if a device is physically plugged in, it cannot be accessed or used by the host operating system if a "Block" policy is in place.

Configuration and Setting Up the USB Device Handler

To enable this feature, you need to update your KubeArmor configuration.

Enable the Handler: You must set two flags to true:
- enableKubeArmorHostPolicy: true
- enableUSBDeviceHandler: true
If you are running KubeArmor in Kubernetes, you will need to patch the DaemonSet. For systemd (non-Kubernetes) mode, you will add these flags to your configuration file.
Set the Default Posture: A new flag, hostDefaultDevicePosture, is also available. This flag (which defaults to audit) determines the action KubeArmor will take on devices that do not match any policy when running in allow-list mode (when there is at least one allow-based policy applied).
- audit (Default): Unmatched devices are audited.
- block: Unmatched devices are automatically blocked.

Monitoring Device Events

You can easily monitor USB device alerts and logs using the karmor CLI:

# Listen for device-specific operations (both logs and alerts)
karmor log --operation device --log-filter all

alt text

Policy in Action - Use Cases

A new device block is now available in the KubeArmorHostPolicy spec. You can match devices based on class, subClass, protocol, and level (attachment level).

Example 1: Audit All Mass Storage Devices

This policy creates an Audit alert every time a USB mass storage device is attached. This is perfect for gaining visibility and meeting compliance requirements without being disruptive.

apiVersion: security.kubearmor.com/v1
kind: KubeArmorHostPolicy
metadata:
  name: hsp-device-mass-storage-audit
spec:
  nodeSelector:
    matchLabels:
      kubernetes.io/hostname: your-node-name # Target a specific node
  severity: 5
  device:
    matchDevice:
    # Class can be a string (e.g., "MASS-STORAGE")
    # or its numeric ID (e.g., 8)
    - class: MASS-STORAGE
  action: Audit # Logs the event

Example 2: Block All Mass Storage Devices

To prevent data exfiltration, you can simply change the action to Block.

apiVersion: security.kubearmor.com/v1
kind: KubeArmorHostPolicy
metadata:
  name: hsp-device-mass-storage-block
spec:
  nodeSelector:
    matchLabels:
      kubernetes.io/hostname: your-node-name
  severity: 8 # Higher severity for a block
  device:
    matchDevice:
    - class: MASS-STORAGE
  action: Block # Blocks the device

Example 3: Block Specific Malicious-Type Devices (e.g., Keyboards)

This policy demonstrates how to use more granular fields to block devices that identify as a keyboard (a common vector for BadUSB attacks).

apiVersion: security.kubearmor.com/v1
kind: KubeArmorHostPolicy
metadata:
  name: hsp-device-hid-keyboard-block
spec:
  nodeSelector:
    matchLabels:
      kubernetes.io/hostname: your-node-name
  severity: 10
  device:
    matchDevice:
    - class: HID      # Human Interface Device
      subClass: 1     # Boot Interface Sub-class
      protocol: 1     # Keyboard
  action: Block

Policy Specificity Matters

The USB Device Handler respects policy priority: the most specifically defined policy wins.

For example, if you have two policies:

Block class: MASS-STORAGE
Allow class: MASS-STORAGE, subClass: 6, protocol: 80

The handler will allow a device that matches the second, more specific policy (a mass storage device with subclass 6 and protocol 80), while still blocking all other mass storage devices. This allows you to create fine-grained allow-lists for specific, approved corporate devices.

Policy In Action - Exact Match, Match All, Allow Based Examples

You can create policies using exact matches, match-all conditions, and allow-based approaches.

Exact Match Example - Audit a Specific USB Device:

apiVersion: security.kubearmor.com/v1
kind: KubeArmorHostPolicy
metadata:
  name: hsp-kubearmor-dev-dvc-audit
spec:
  nodeSelector:
    matchLabels:
      kubernetes.io/hostname: aryan
  severity: 5
  device:
    matchDevice:
    - class: MASS-STORAGE
  action: Audit

Logs (after attaching a USB drive and 2 other USB devices): alt text

Match All Example - Block All USB Devices:

apiVersion: security.kubearmor.com/v1
kind: KubeArmorHostPolicy
metadata:
  name: hsp-kubearmor-dev-dvc-audit
spec:
  nodeSelector:
    matchLabels:
      kubernetes.io/hostname: aryan
  severity: 5
  device:
    matchDevice:
    - class: ALL
  action: Audit

Logs (after attaching 3 different USB devices): alt text

Allow Based Example

apiVersion: security.kubearmor.com/v1
kind: KubeArmorHostPolicy
metadata:
  name: hsp-kubearmor-dev-dvc-audit
spec:
  nodeSelector:
    matchLabels:
      kubernetes.io/hostname: aryan
  severity: 5
  device:
    matchDevice:
    - class: ALL
  action: Allow

alt text

When No Policy is Set:

alt text

For non-k8s mode:

apiVersion: security.kubearmor.com/v1
kind: KubeArmorHostPolicy
metadata:
  name: hsp-kubearmor-dev-dvc-audit
spec:
  nodeSelector:
    matchLabels:
      kubearmor.io/hostname: aryan
  severity: 5
  device:
    matchDevice:
    - class: ALL
  action: Audit

alt text

Conclusion

The new USB Device Enforcement feature brings critical, hardware-level runtime security to your KubeArmor-protected hosts. You can now gain full visibility into device events, prevent unauthorized USB access, and build a more resilient security posture against physical threats.

A special thanks to AryanBakliwal for contributing this major feature.

KubeArmor and Confidential Containers for zero-trust security in Kubernetes environments

September 2, 2025 · 6 min read

Rishabh Soni

Maintainer @KubeArmor | Software Engineer @ Accuknox

Enforcing security policies in a zero-trust environment presents a fundamental challenge: controls enforced on the host can be compromised or create unintended side effects for other workloads.

A more robust approach sandboxes the security enforcement mechanism within the workload runtime environment itself, ensuring policies are isolated and specific to the workload they protect.

The integration of KubeArmor’s eBPF-based security with Confidential Containers (CoCo) achieves this isolated enforcement model. It details the proof-of-concept (PoC) architecture, policy enforcement mechanisms, and key security considerations for creating a zero-trust solution for sensitive workloads.

Security Challenges in Confidential Environments

Vault Security

Securing secrets within a Kubernetes (K8s) environment is critical, and using a tool like HashiCorp Vault is a common best practice.

Vault stores highly sensitive data such as passwords, API tokens, access keys, and connection strings. A compromised vault can lead to severe consequences including ransomware attacks, organizational downtime, and reputational damage.

Key Threat Models and Risks

Threat Model	Threat Vector(s)	Remediation
👤 User Access Threats	An attacker compromises a legitimate user’s endpoint or credentials to impersonate them and access the Vault.	• Implement MFA • Enforce least privilege
🖥️ Server Threats	• Lateral movement from another compromised service • Exploiting Vault vulnerability (e.g., RCE) to inspect memory or access secret volumes	• Use network segmentation • Disallow `kubectl exec` on Vault pods • Apply runtime security
💻 Client-Side Threats	Authorized app retrieves a secret but stores it insecurely (e.g., plaintext, memory, disk, env var).	• Ensure secure memory practices • Never persist secrets to disk/env

The Need for Multi-Layered Security

coco-arch

Relying solely on RBAC is insufficient. A defense-in-depth strategy combining strong authentication, network segmentation, least-privilege access, and runtime protection is essential.

However, all protections assume the control plane, worker nodes, and cluster admin are trusted. If the cluster-admin itself is compromised, RBAC and network policies fail.

This is where CoCo + KubeArmor integration provides immutable policies, runtime protection, and data-in-use protection inside hardware-backed enclaves.

Integration Architecture

KubeArmor runs in systemd mode inside a Kata VM, ensuring policies are enforced directly within the confidential environment.

Key Components

Systemd Mode – Runs as a service in the Kata VM with immutable policies.
Embedded Policies – Bundled in the VM image, loaded at boot.
OCI Prestart Hook – Initializes KubeArmor before workload execution.
VM Image Preparation – KubeArmor binaries, configs, and units added to Kata VM image.

VM Image Preparation

Configuration Examples

`kubearmor.path`

[Unit]
Description=Monitor for /run/output.json and start kubearmor

[Path]
PathExists=/run/output.json
Unit=kubearmor.service

[Install]
WantedBy=multi-user.target

`kubearmor.service`

[Unit]
Description=KubeArmor

[Service]
User=root
KillMode=process
WorkingDirectory=/opt/kubearmor/
ExecStart=/opt/kubearmor/kubearmor /opt/kubearmor/kubearmor.yaml
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Policy Enforcement Mechanisms

In the PoC, KubeArmor enforced fine-grained policies within the confidential container VM.

PoC Demo Summary

Blocked: raw sockets, writing to /bin
Allowed: unrestricted pod performed these actions successfully

✅ Validated runtime policy enforcement inside a confidential container.

Key Capabilities

Restrict execution of raw sockets
Block write operations to /bin, /usr/bin, /boot
Wildcard-based selectors for all containers in Kata VM

Policy Examples

Network Policy

apiVersion: security.kubearmor.com/v1
kind: KubeArmorPolicy
metadata:
  name: net-raw-block
spec:
  selector:
    matchLabels:
      kubearmor.io/container.name: ".*"
  network:
    matchProtocols:
      - protocol: raw
  action: Block

File Integrity Policy

apiVersion: security.kubearmor.com/v1
kind: KubeArmorPolicy
metadata:
  name: file-integrity-monitoring
spec:
  action: Block
  file:
    matchDirectories:
      - dir: /bin/
        readOnly: true
      - dir: /sbin/
        readOnly: true
  selector:
    matchLabels:
      kubearmor.io/container.name: ".*"

Security Considerations

SPOF: If kubearmor.service stops, enforcement halts.
API Path Restrictions: Cannot block HTTP API paths yet.
PID-Specific Controls: No support for blocking stdout/stderr for specific PIDs.

Setup Guide: KubeArmor with Confidential Containers

Prerequisites

Kata Containers running in Kubernetes cluster.

1. Build an eBPF-Enabled Kata Kernel

git clone https://github.com/kata-containers/kata-containers.git
cd kata-containers/tools/packaging/kernel
./build-kernel.sh -v 6.4 setup
mv kata-linux-6.4-135/.config kata-linux-6.4-135/.config_backup
cp kata-config kata-linux-6.4-135/.config
./build-kernel.sh -v 6.4 build
sudo ./build-kernel.sh -v 6.4 install
sudo sed -i 's|^kernel =.*|kernel="/usr/share/kata-containers/vmlinux.container"|' \
  /opt/kata/share/defaults/kata-containers/configuration-qemu.toml

2. Prepare Kata VM Image for KubeArmor

Place KubeArmor binaries under:

cloud-api-adaptor/podvm-mkosi/resources/binaries-tree/opt/kubearmor/

Structure:

BPF/
kubearmor
kubearmor.yaml
policies/
templates/

Example kubearmor.yaml:

k8s: false
useOCIHooks: true
hookPath: /run/output.json
enableKubeArmorStateAgent: true
enableKubeArmorPolicy: true
visibility: process,network
defaultFilePosture: audit
defaultNetworkPosture: audit
defaultCapabilitiesPosture: audit
alertThrottling: true
maxAlertPerSec: 10
throttleSec: 30

3. Update Presets

File: 30-coco.preset

enable attestation-protocol-forwarder.service
enable attestation-agent.service
enable api-server-rest.path
enable confidential-data-hub.path
enable kata-agent.path
enable netns@.service
enable process-user-data.service
enable setup-nat-for-imds.service
enable kubearmor.path
enable gen-issue.service
enable image-env.service

4. Add Prestart Hook

Place kubearmor-hook under:

cloud-api-adaptor/podvm-mkosi/resources/binaries-tree/usr/share/oci/hooks/prestart

5. Place Policies

cloud-api-adaptor/podvm-mkosi/resources/binaries-tree/opt/kubearmor/policies

Example:

protect-env.yaml
host-net-raw-block.yaml
host-file-integrity-monitoring.yaml

6. Deploy & Test Policy Enforcement

apiVersion: security.kubearmor.com/v1
kind: KubeArmorPolicy
metadata:
  name: block-mysql-data-access
spec:
  selector:
    matchLabels:
      app: mysql
  file:
    matchDirectories:
    - dir: /var/lib/mysql/
      recursive: true
  action: Block

Test:

kubectl exec -it [MYSQL_POD_NAME] -c mysql -- cat /var/lib/mysql/any-file
# Expected: Permission denied

Challenges and Future Improvements

Wildcard selectors fixed fragility of container-name-based policies
Need daemonless policy persistence to mitigate SPOF
Enhance support for dynamic policy updates via CoCo initdata
Path-based network and process output restrictions pending kata-agent integration
Validate on production CoCo/Kata environments
Protect against service termination attacks inside enclaves

Acknowledgements

This work was made possible through close collaboration with the Confidential Containers (CoCo) community. We thank all contributors and maintainers for their guidance, feedback, and support throughout the development and integration process.

KubeArmor & OCI Hooks for Enhanced Container Runtime Security

August 4, 2025 · 3 min read

Rishabh Soni

Maintainer @KubeArmor | Software Engineer @ Accuknox

cover

Why OCI Hooks Matter in Modern Cloud Workloads

What are OCI Hooks?

OCI (Open Container Initiative) hooks are standard, custom binaries that the container runtime executes at specific points in a container’s lifecycle.

Lifecycle of a Container with OCI Hooks

In practice, when a container runtime (like CRI-O or containerd) launches a container, it consults the OCI configuration (or NRI plugin) and executes any configured hooks at the appropriate stages.

Create runtime: After the runtime unpacks the container image and sets up namespaces, it runs the create runtime hook. For example, this could be used to register the container with monitoring tools.
Poststop: When the container process exits, the poststop hook is run. For example, this function can log the shutdown or trigger cleanup.

The OCI spec mandates hooks be executed in order, and the container’s state is passed via stdin, allowing the hook to identify the container (by ID, metadata, etc.).

⚠️ Note: A hook execution failure can abort container creation.

Why KubeArmor Introduced OCI Hooks Support

Before OCI hooks, KubeArmor obtained container events by mounting the container runtime’s Unix socket inside its pod and polling it.

This has serious security drawbacks:
- Access to the CRI socket allows creating/deleting containers
- Breaks container isolation
- Is considered a security flaw

OCI hooks eliminate this dependency by giving KubeArmor event-driven access to container lifecycle data—securely.

old-arch new-arch

How KubeArmor Integrates with OCI Hooks

KubeArmor is typically deployed as a DaemonSet on each node.
It uses eBPF programs attached to LSM hooks (AppArmor/SELinux) to monitor syscalls.
For OCI hook support:
- A hook binary is placed on the host by the KubeArmor Operator.
- Its path is configured in the runtime’s hook JSON.
- On a container lifecycle event, the runtime invokes the hook binary.
- This binary collects container info (PID, namespace IDs, AppArmor profile) from /proc.
- It sends the info to the KubeArmor daemon over a Unix socket (ka.sock).
- KubeArmor then registers/unregisters the container in real time.

✅ All without mounting or polling any CRI runtime socket.

block-diagram

Use Cases Enabled by OCI Hooks

Eliminate Socket Privileges: No need for privileged access to CRI sockets → drastically improved security.
Richer Context: Hook runs on host → accesses container configuration directly (AppArmor/SELinux, namespaces, image layers).
Broader Environment Coverage: Works even in environments where CRI sockets aren’t accessible (as long as OCI hooks are supported).

Roadmap: What’s Next for OCI Hooks in KubeArmor

OCI hooks support is currently experimental. Future plans include:

Auto Deploy NRI Injector
- Automate deployment via KubeArmor Operator
- Eliminate manual installation of NRI on every node
Broader Runtime Support
- Add Podman support using OCI hooks
- Use hooks as a default integration pattern for new runtimes

References & Resources

📘 KubeArmor GitHub
📄 Official Docs: KubeArmor with OCI Hooks
📚 OCI Runtime Spec

Introduction to Linux Security Modules (LSMs)

June 23, 2023 · 5 min read

Barun Acharya

Maintainer @KubeArmor | Software Engineer @ Accuknox

LSM hooks in Linux Kernel mediates access to internal kernel objects such as inodes, tasks, files, devices, and IPC. LSMs, in general, refer to these generic hooks added in the core kernel code. Further, security modules could make use of these generic hooks to implement enhanced access control as independent kernel modules. AppArmor, SELinux, Smack, TOMOYO are examples of such independent kernel security modules.

LSM seeks to allow security modules to answer the question "May a subject S perform a kernel operation OP on an internal kernel object OBJ?"

LSMs can drastically reduce the attack surface of a system if appropriate policies using security modules are implemented.

DACs vs. MACs

DAC (Discretionary Access Control) based access control is a means of restricting access to objects based on the identity of subjects or groups. For decades, Linux only had DAC-based access controls in the form of user and group permissions. One of the problems with DACs is that the primitives are transitive in nature. A user who is a privileged user could create other privileged users, and that user could have access to restricted objects.

With MACs (Mandatory Access Control), the subjects (e.g., users, processes, threads) and objects (e.g., files, sockets, memory segments) each have a set of security attributes. These security attributes are centrally managed through MAC policies. In the case of MAC, the user/group does not make any access decision, but the access decision is managed by security attribute.

LSMs are a form of MAC-based controls.

LSM Hooks

LSM mediates access to kernel objects by placing hooks in the kernel code just before the access.

It can be seen here that the LSM hooks are applied after the DAC and other sanity checks are performed.

Here it is shown that the LSM hooks are applied in core objects, and these hooks are dereferenced using a global hooks table. These global hooks are added ( e.g., check apparmor hooks when the security module is initialized.

TOCTOU problem handling

LSMs are typically used for a system's policy enforcement. One school of thought is that the enforcement can be handled in an asynchronous fashion, i.e., the kernel audit events could pass the alert to userspace, and then the userspace could enforce the decision asynchronously.

Such an approach has several issues, i.e., the asynchronous nature might result in the malicious actor causing the actual damage before the actor could be identified. For example, if the unlink() of a file object is to be blocked, the asynchronous nature might result in the unlink getting successful before the attack could be blocked.

LSM hooks are applied inline to the kernel code processing; the kernel has the security context and other details of the object while making the decision inline. Thus the enforcement is inline to the access attempt, and any blocking/denial action can be performed without TOCTOU problems.

Security Modules currently defined in Linux kernel

$ grep -Hnrw "DEFINE_LSM" LINUX-KERNEL-SRC-CODE/

./security/smack/smack_lsm.c:4926:DEFINE_LSM(smack) = {
./security/tomoyo/tomoyo.c:588:DEFINE_LSM(tomoyo) = {
./security/loadpin/loadpin.c:246:DEFINE_LSM(loadpin) = {
./security/commoncap.c:1468:DEFINE_LSM(capability) = {
./security/selinux/hooks.c:7387:DEFINE_LSM(selinux) = {
./security/bpf/hooks.c:30:DEFINE_LSM(bpf) = {
./security/safesetid/lsm.c:264:DEFINE_LSM(safesetid_security_init) = {
./security/lockdown/lockdown.c:163:DEFINE_LSM(lockdown) = {
./security/integrity/iint.c:174:DEFINE_LSM(integrity) = {
./security/yama/yama_lsm.c:485:DEFINE_LSM(yama) = {
./security/apparmor/lsm.c:1905:DEFINE_LSM(apparmor) = {

In the above list, AppArmor and SELinux are undoubtedly the most widely used. AppArmor is relatively easier to use, but SELinux provides the greater intensive and fine-grained policy specification. Linux POSIX.1e capabilities logic is also implemented as a security module.

There can be multiple security modules used at the same time. This is true in most cases; the capabilities module is always loaded alongside SELinux or any other LSM. The capabilities security module is always ordered first in execution (controlled using .order = LSM_ORDER_FIRST flag).

Stackable vs Non-Stackable LSMs

Note that AppArmor, SELinux, and Smack security modules initialize themselves as exclusive (LSM_FLAG_EXCLUSIVE) security modules. There cannot be two security modules in the system with LSM_FLAG_EXCLUSIVE flag set. Thus, this means that one cannot have any two of the following (SELinux, AppArmor, Smack) security modules registered simultaneously.

BPF-LSM is a stackable LSM and thus can be used alongside AppArmor or SELinux.

Permissive hooks in LSMs

Certain POSIX-compliant filesystems depend on the ability to grant accesses that would ordinarily be denied at a coarse level (DAC level) of granularity (check capabilities man page for CAP_DAC_OVERRIDE). LSM supports DAC override (a.k.a., permissive hooks) for particular objects such as POSIX-compliant filesystems, where the security module can grant access the kernel was about to deny.

Security Modules: A general critique

LSMs, as generic MAC-based security primitives, are very powerful. The security modules allow the administrator to impose additional restrictions on the system to reduce the attack surface. However, if the security module policy specification language is hard to understand/debug, the administrator usually takes a stance of disabling it altogether, thus imposing friction in adoption.

References

Linux Security Modules: General Security Support for the Linux Kernel, Wright & Cowan et al., 2002
https://www.kernel.org/doc/html/v5.8/security/lsm.html

KubeArmor Performance Benchmarking Data

March 13, 2023 · 8 min read

Rudraksh Pareek

Maintainer @KubeArmor | Software Engineer @ Accuknox

Benchmarking data

Config

Node: 4
Platform - AKS
Workload -> Sock-shop
replica: 1
Tool -> Apache-bench (request at front-end service)
Vm: DS_v2

VM	CPU	Ram	Data disks	Temp Storage
DS2_v2	2	7 GiB	8	14 GiB

Without Kubearmor

Average

Scenario	Requests	Concurrent Requests	Kubearmor CPU (m)	Kubearmor Memory (Mi)	Throughput (req/s)	Average time per req. (ms)	# Failed requests	Micro-service CPU (m)	Micro-service Memory (Mi)
no kubearmor	50000	5000	-	-	2205.502	0.4534	0	401.1	287.3333333

Readings

Scenario	Requests	Concurrent Requests	Kubearmor CPU (m)	Kubearmor Memory (Mi)	Throughput (req/s)	Average time per req. (ms)	# Failed requests	Micro-service CPU (m)	Micro-service Memory (Mi)
no kubearmor	50000	5000	-	-	2246.79	0.445	0	380	239
--	--	--	--	--	--	--	--	--	--
no kubearmor	50000	5000	-	-	2187.22	0.457	0	378	358
no kubearmor	50000	5000	-	-	2244.16	0.446	0	451	258
no kubearmor	50000	5000	-	-	2213.37	0.452	0	351	304
no kubearmor	50000	5000	-	-	2131.19	0.469	0	380	251
no kubearmor	50000	5000	-	-	2215.89	0.451	0	400	326
no kubearmor	50000	5000	-	-	2172.19	0.46	0	428	332
no kubearmor	50000	5000	-	-	2195.73	0.455	0	444	240
no kubearmor	50000	5000	-	-	2206.41	0.453	0	385	278
no kubearmor	50000	5000	-	-	2242.07	0.446	0	414	318
Average					2205.502	0.4534	0	401.1	287.3333333

Kubearmor with discovered Policy Applied

Average

Scenario	Requests	Concurrent Requests	Kubearmor CPU (m)	Kubearmor Memory (Mi)	Throughput (req/s)	Average time per req. (ms)	# Failed requests	Micro-service CPU (m)	Micro-service Memory (Mi)
no kubearmor	50000	5000	141.2	111.9	2169.358	0.4609	0	438.2	435.1

Readings

Scenario	Requests	Concurrent Requests	Kubearmor CPU (m)	Kubearmor Memory (Mi)	Throughput (req/s)	Average time per req. (ms)	Micro-service CPU (m)	Micro-service Memory (Mi)
with Policy	50000	5000	131	113	2162.86	0.462	542	446
with Policy	50000	5000	139	111	2190.72	0.456	457	458
with Policy	50000	5000	145	112	2103.46	0.475	445	395
with Policy	50000	5000	149	108	2155.55	0.464	440	454
with Policy	50000	5000	129	113	2177.68	0.459	395	394
with Policy	50000	5000	160	122	2198.53	0.455	435	503
with Policy	50000	5000	156	117	2179.89	0.459	391	451
with Policy	50000	5000	134	119	2196.78	0.455	408	429
with Policy	50000	5000	129	114	2178.07	0.459	424	435
with Policy	50000	5000	140	112	2150.04	0.465	445	386
Average			141.2	111.9	2169.358	0.4609	438.2	435.1

BPF LSM benchmarking data

Scenario	Config		KubeArmor		Microservices
	Requests	Concurrent Requests	CPU (m)	Memory (Mi)	Throughput (req/s)	Average time per req. (ms)	# Failed requests	CPU (m)	Memory (Mi)
with kubearmor	50000	5000	130	99	1889.81	0.529	0	407	324
with kubearmor	50000	5000	120	104	1955.26	0.511	0	446	423
with kubearmor	50000	5000	122	101	1952.94	0.512	0	433	448
with kubearmor	50000	5000	152	104	1931.71	0.518	0	474	405
with kubearmor	50000	5000	142	108	1896.01	0.527	0	564	413
with kubearmor	50000	5000	110	107	1896.95	0.527	0	416	375
with kubearmor	50000	5000	115	106	1868.77	0.535	0	354	383
with kubearmor	50000	5000	114	109	1877.29	0.533	0	461	355
with kubearmor	50000	5000	130	105	1962.81	0.509	0	552	380
with kubearmor	50000	5000	102	110	1966.19	0.509	0	351	297
Average			123.7	105.3	1919.774	0.521	0	445.8	380.3

Scenario	Config		KubeArmor		Microservices
	Requests	Concurrent Requests	CPU (m)	Memory (Mi)	Throughput (req/s)	Average time per req. (ms)	# Failed requests	CPU (m)	Memory (Mi)
with policy	50000	5000	103	110	1806.06	0.529	0	431	330
with policy	50000	5000	122	111	1836.04	0.511	0	432	348
with policy	50000	5000	123	108	1871.02	0.512	0	505	393
with policy	50000	5000	118	111	1915.07	0.518	0	599	331
with policy	50000	5000	121	110	1896.34	0.527	0	405	310
with policy	50000	5000	126	113	1896.7	0.527	0	450	430
with policy	50000	5000	117	110	1915.79	0.535	0	408	382
with policy	50000	5000	128	111	1885.77	0.533	0	482	321
with policy	50000	5000	122	114	1900.96	0.509	0	433	359
with policy	50000	5000	124	104	1887.87	0.509	0	448	393
Average			120.4	110.2	1881.162	0.5318	0	459.3	359.7

KubeArmor support for Oracle Container Engine for Kubernetes (OKE)

February 11, 2023 · 3 min read

Rahul Jadhav

CNCF Ambassador | Co-founder @ Accuknox

Introduction

Oracle Container Engine for Kubernetes (OKE) is a managed Kubernetes service for operating containerized applications at scale while reducing the time, cost, and operational burden of managing the complexities of Kubernetes infrastructure. Container Engine for Kubernetes enables you to deploy Kubernetes clusters instantly and ensure reliable operations with automatic updates, patching, scaling, and more.

Oracle Linux is a distribution of Linux developed and maintained by Oracle and is primarily used on OKE. It is based on the Red Hat Enterprise Linux (RHEL) distribution and is designed to provide a stable and secure environment for running enterprise-level applications. Oracle Linux includes Unbreakable Enterprise Kernel (UEK) which delivers business-critical performance and security optimizations for cloud and on-premises deployment.

Supporting KubeArmor on Oracle Linux

Oracle Linux

While UEK (Unbreakable Enterprise Kernel) is a heavily fortified kernel image, the security of the pods and the containers are still the responsibility of the application developer. KubeArmor, a CNCF (Cloud Native Computing Foundation) sandbox project, is a runtime security engine that leverages extended Berkeley Packet Filter (eBPF) and Berkeley Packet Filter-Linux Security Module (BPF-LSM) to protect the pods and containers.

With version 0.5, KubeArmor integrates with BPF-LSM for pod and container-based policy enforcement. BPF-LSM is a new LSM (Linux Security Modules) that’s introduced in the newer kernels (version > 5.7). BPF-LSM allows KubeArmor to attach bpf-bytecode at LSM hooks that contain user-specified policy controls.

Linux Kernel

KubeArmor provides enhanced security by using BPF-LSM to protect k8s pods hosted on OKE by limiting system behavior with respect to processes, files, and the use of network primitives. For example, a k8s service access token that’s mounted within the pod is accessible by default across all the containers within that pod. KubeArmor can restrict access to such tokens only for certain processes. Similarly, KubeArmor is used to protect other sensitive information (e.g., k8s secrets, x509 certificates) within the container. You can specify policy rules in KubeArmor such that any attempts to update the root certificates in any of the certificate’s folders (i.e., /etc/ssl/, /etc/pki/, or /usr/local/share/ca-certificates/) can be blocked. Moreover, KubeArmor can restrict the execution of certain binaries within the containers.

To Summarize

KubeArmor, a cloud-native solution now supports OKE to secure pods and containers using BPF-LSM for inline attack mitigation/prevention. In the case of k8s, the pods are the execution units and are usually exposed to external entities. Thus, it’s imperative to have a layer of defense within the pods so that the attacker is limited in their ability to use system primitives to exploit the vulnerability. KubeArmor is a k8s-native solution that uses Linux kernel primitives on Unbreakable Enterprise Kernel (UEK) to harden the pods, further fortifying the K8s engine.

Annotation controller

July 14, 2022 · 3 min read

Achref Ben Saad

Maintainer @KubeArmor

KubeArmor annotation controller

Starting from version 0.5, kubearmor leverages admission controllers to support policy enforcement on a wide range of Kubernetes workloads such as individual pods, jobs, statefulsets, etc … .

What is an admission controller?

An admission controller is a piece of code that intercepts requests to the Kubernetes API server prior to persistence of the object(1). Admission controllers can be one of two types:

Validating admission controllers: used to either accept or reject an action on a resource, e.g: reject creation of pods in the default namespace. Kubernetes comes shipped with many validating controllers such as NodeRestriction controller that limits what kubelet can modify.
Mutation admission controllers: used to apply modifications on requests prior to persistence, e.g: add default resource requests if they are not defined by a user. AWS EKS uses mutation controllers to add environment variables(region, node name, …) to each created container.

The order of admission controllers executions is as follow:

All mutations are performed on the original request then merged, if a conflict occurs an error is yielded. Only the schema of the resultant merge is validated.
All validating controllers are called, the request will be rejected if one validating controller rejects the request.

Admission controller

KubeArmor leverages mutation controllers to enable policy enforcement on kubernetes workload.

What are the benefits of the annotation controller?

Before v0.5, policies were enforced by applying the appropriate annotations to pods by patching their parent deployment. This meant that policies can be applied only to pods that are being controlled by deployments.

By using mutation controllers we are able to extend kubearmor capabilities to support basically all types of workloads as the annotations will be applied to any pod prior to their creation, as a result, kubearmor will send far less requests to the API server as patch operations were executed in parallel and often in a concurrent manner by all kubearmor pods.

Admission controller

What if the controller fails?

KubeArmor maintains the old annotation logic as a fallback logic in order to enable our users to continue to benefit from kubearmor policy enforcement but at a degraded level in case of a failure, details can be found at the event section of the newly created pods.

How can I install it?

The controller comes bundled with kubearmor, you can install it via karmor cli tool or via our installation manifests under /deployments.

What are the Kubernetes versions that can support the new controller?

The controller can be run on kubernetes clusters starting from v1.9 on newer. Please keep in mind that kubernetes only support the three latest versions of kubernetes.

References

Kubernetes documentation(1)

KubeArmor BPF LSM Integration

July 10, 2022 · 3 min read

Barun Acharya

Maintainer @KubeArmor | Software Engineer @ Accuknox

KubeArmor BPF LSM Integration

High Level Module Changes

Now	Proposed

Module Design

Map Design

Outer Map details

struct outer_hash {
  __uint(type, BPF_MAP_TYPE_HASH_OF_MAPS);
  __uint(max_entries, X);
  __uint(key_size, sizeof(struct outer_key));    // 2*u32
  __uint(value_size, sizeof(u32));               // Inner Map File Descriptor
  __uint(pinning, LIBBPF_PIN_BY_NAME);           // Created in Userspace, Identified in Kernel Space using pinned name
};

Key
- Identifier for Containers

struct outer_key {
  u32 pid_ns;
  u32 mnt_ns;
};

Inner Map details

&ebpf.MapSpec{
    Type:       ebpf.Hash,
    KeySize:    4,            // Hash Value of Entity
    ValueSize:  8,            // Decision Values
    MaxEntries: 1024,
    }

Value

struct data_t {
  bool owner;        // owner only flag
  bool read;         // read only flag 
  bool dir;          // policy directory flag
  bool recursive;    // directory recursive flag
  bool hint;         // policy directory hint
};

Handling of Events

Deeper Dive with Examples

But what if it's not a match We explore how directory matching works in the next example
Notice How we split the directory policy to a sets of hints in the map. This helps in efficient matching of directory paths in Kernel Space.
What if we try to access a file in a different directory.

Presence of no hint helps break through the iteration hence optimizing the process.

Directory Matching

#pragma unroll
  for (int i = 0; i < MAX_STRING_SIZE; i++) {
    if (path[i] == '\0')
      break;

    if (path[i] == '/') {
      __builtin_memset(&dir, 0, sizeof(dir));
      bpf_probe_read_str(&dir, i + 2, path);

      fp = jenkins_hash(dir, i + 1, 0);

      struct data_t *val = bpf_map_lookup_elem(inner, &fp);
      if (val) {
        if (val->dir) {
          matched = true;
          goto decisionmaker;
        }
        if (val->hint == 0) { // If we match a non directory entity somehow
          break;
        }
      } else {
        break;
      }
    }
  }

Hashing

Files and Source Names can be huge. Worst case both add to 8192 bytes. Which is a very large entity for key. So we hash that value to u32 key. We stored hashed values from userspace and lookup up hashed values from kernel space for decision making.

We plan to use a Jenkins hash algorithm modified for use in ebpf land and matching implementation in user land.

Based on Event Auditor Implementation

Inspirations

TODO/ToCheck

List out LSM Hooks to be integrated with
Explore Hashing
Analyse Performance Impact
...

Miscellaneous Notes

Pattern Matcher
- AppArmor has it's own DFA based regular expression matching engine https://elixir.bootlin.com/linux/latest/source/security/apparmor/match.c
- Geyslan's ebpf pattern matcher : https://github.com/geyslan/ebpf-pattern/tree/a-story-of-two-maps
LSM Hooks
- AppArmor LSMs: https://elixir.bootlin.com/linux/latest/source/security/apparmor/lsm.c#L1188
- SELinux LSMs: https://elixir.bootlin.com/linux/latest/source/security/selinux/hooks.c#L7014
- Program Exec Ops: https://elixir.bootlin.com/linux/latest/source/include/linux/lsm_hooks.h#L35
- Task Ops: https://elixir.bootlin.com/linux/latest/source/include/linux/lsm_hooks.h#L604
- File Ops: https://elixir.bootlin.com/linux/latest/source/include/linux/lsm_hooks.h#L507
- Inode Ops: https://elixir.bootlin.com/linux/latest/source/include/linux/lsm_hooks.h#L213
- Socket Ops: https://elixir.bootlin.com/linux/latest/source/include/linux/lsm_hooks.h#L842
Papers/TechDocs

Unraveling BPF LSM Superpowers

July 10, 2022 · 5 min read

Barun Acharya

Maintainer @KubeArmor | Software Engineer @ Accuknox

A few months back I presented at Cloud Native eBPF Day Europe 2022 about Armoring Cloud Native Workloads with BPF LSM and planted a thought about building a holistic tool for runtime security enforcement leveraging BPF LSM. I have spent the past few weeks collaborating with the rest of the team at KubeArmor to realize that thought. This blog post will explore the why’s and how’s of implementing security enforcement as part of KubeArmor leveraging BPF LSM superpowers at its core.

Why❓

Linux Security Modules provides with security hooks necessary to set up the least permissive perimeter for various workloads. A nice introduction to LSMs here.

KubeArmor is a cloud-native runtime security enforcement system that leverages these LSMs to secure the workloads.

LSMs are really powerful but they weren’t built with modern workloads including Containers and Orchestrators in mind. Also, the learning curve of their policy language seems to be steep thus imposing friction in adoption.

eBPF has provided us with the ability to safely and efficiently extend the kernel’s capabilities without requiring changes to kernel source code or loading kernel modules.

BPF LSM leverages the powerful LSM framework while providing us with the ability to load our custom programs with decision-making into the kernel seamlessly helping us protect modern workloads with enough context while we can choose to keep the interface easy to understand and user-friendly.

KubeArmor already integrates with AppArmor and SELinux and has a set of tools and utilities providing a seamless experience for enforcing security but these integrations come with their own set of complexities and limitations. Thus the need to integrate with BPF LSM would provide us with fine-grained control over the LSM hooks.

How✍️

The Implementation can be conveyed by the following tales:

Map to cross boundaries - Establishing the interface between KubeArmor daemon (Userspace) and BPF Programs (KernelSpace)
Putting Security on the Map - Handling Policies in UserSpace and Feeding them into the Map
Marshal Law - Enforcing Policies in the KernelSpace

Map to cross boundaries🗺️

There’s a clear boundary between the territories of KernelSpace and the Userspace, so how do we establish the routes between these two.

We leverage eBPF Maps for establishing the interface between the KubeArmor daemon (Userspace) and BPF Programs (KernelSpace). As described in the kernel doc,

‘maps’ is a generic storage of different types for sharing data between kernel and userspace.

It seems apt to help us navigate here 😁

For each container KubeArmor needs to protect or how I like to term it ‘Armor Up’ the workload, we create an entry in the global BPF Hash of Maps pinned to the BPF filesystem under /sys/fs/bpf/kubearmor_containers, this entry has a value to another BPF Hashmap which has all the details of policies that are needed to enforce.

Map Interface

Putting Security on the Map📍

KubeArmor Security Policies have a lot of metadata, We cannot put all that to Maps and let the BPF Program navigate those complexities.

For instance, we usually map the security policies through labels associated with Pods and Containers. But we can’t send that to the BPF Program, let alone labels eBPF programs wouldn’t handle container names/IDs as well. So KubeArmor extracts information from Kubernetes and CRI (Docker/Containerd/CRI-O) APIs and simplifies it to something we can extract in the eBPF Program as well.

struct key {
  u32 pid_ns;
  u32 mnt_ns;
};

struct containers {
  __uint(type, BPF_MAP_TYPE_HASH_OF_MAPS);
  __uint(max_entries, X);
  __uint(key_size, sizeof(struct key));          // 2*u32
  __uint(value_size, sizeof(u32));               // Rule Map File Descriptor
  __uint(pinning, LIBBPF_PIN_BY_NAME);           // Created in Userspace, Identified in Kernel Space using pinned name
};

Similarly, KubeArmor can receive conflicting policies, we need to handle and resolve them as part of KubeArmor Userspace program before putting them on the Map.

After all the rule simplification, conflict resolution, and handling of policy updates, We send the data to the eBPF Map prepping the BPF Programs to get ready for enforcement.

Marshal Law

We finally marshal all the data in the kernel space and impose MAC (Military Access Control? 🔫 Pun intended :P ).

In the kernel space where our BPF LSM Programs reside, In each program, we extract the main entity i.e. File Path, Process Path, or Network Socket/Protocol, and pair them up with their parent process paths and look up in the respective maps. We do the decision making based on these lookup values.

There are a fair bit of complexities involved here which I have skipped, if you are interested in them, check out Design Doc and Github Pull Request.

I would also like to credit How systemd extended security features with BPF LSM which acted as an inspiration for the implementation design.

Armoring Up

The environment requires a kernel >= 5.8 configured with CONFIG_BPF_LSM, CONFIG_DEBUG_INFO_BTF and the BPF LSM enabled (via CONFIG_LSM="...,bpf" or the "lsm=...,bpf" kernel boot parameter).

You can use the daemon1024/kubearmor:bpflsm image and follow the Deployment Guide to try it out.

A sample alert here showing enforcement through BPF LSM: BPF

Next Steps

We have unraveled just a drop of what BPF LSM is capable of, and we plan to extend our security features to lots of other use cases and BPF LSM would play an important role in it.

Near future plans include supporting wild cards in Policy Rules and doing in-depth performance analysis and optimizing the implementation.

👋
That sums up my journey to implement security enforcement leveraging BPF LSM at its core. It was a lot of fun and I learned a lot. Hope I was able to share my learnings 😄

If you have any feedback about the design and implementation feel free to comment on the Github PR. If you have any suggestions/thoughts/questions in general or just wanna say hi, my contact details are here ✌️

KubeArmor Seccomp Support

March 21, 2022 · 3 min read

Rahul Jadhav

CNCF Ambassador | Co-founder @ Accuknox

High Level Design

Policy Mapping

KubeArmorPolicy for seccomp

apiVersion: security.kubearmor.com/v1
kind: KubeArmorSeccompPolicy
metadata:
  name: ksp-wordpress-block-process
  namespace: wordpress-mysql
spec:
  severity: 3
  selector:
    matchLabels:
      app: wordpress
  seccomp:
    arch: [x86_64, x86, x32]    #OPTIONAL
    syscalls: [accept4, epoll_wait, pselect6, futex, madvise]
    action: Allow

Following is the mapped seccomp profile:

{
  "defaultAction":"SCMP_ACT_ERRNO",
  "architectures":[
     "SCMP_ARCH_X86_64",
     "SCMP_ARCH_X86",
     "SCMP_ARCH_X32"
  ],
  "syscalls":[
     {
      "names":[
          "accept4",
          "epoll_wait",
          "pselect6",
          "futex",
          "madvise"
      ],
      "action":"SCMP_ACT_ALLOW"
     }
  ]
}

Rational for separate `Kind: KubeArmorSeccompPolicy` (kscmp for short)

K8s, Docker or any other container runtime allows the user to set just a single seccomp policy per pod. The seccomp policy has to be annotated as part of the container label and the pod needs to be restarted for any change in the policy. This is the primary reason for having a separate kscmp kind and not just reusing the existing ksp with seccomp rules.

Handling multiple `kscmp` policies per pod

There could be at most one seccomp policy per container. Thus when there are multiple seccomp policies pushed for the same pod then the seccomp rules have to be merged. General rule of thumb would be that a syscall deny would always take precedence in case there are multiple seccomp rules for the same syscall.

Consider an example:

Policy 1: kscmp1 - allow{s1, s2, s4}, deny{s3}
Policy 2: kscmp2 - allow{s1, s3}, deny{s2} Here s1, s2, s3, s4 are 4 different syscalls and kscmp1/2 are two KubeArmorSeccompPolicy. Final Seccomp Profile after merging the kscmp policies will be: allow {s1, s4}, deny {s2, s3}

Lets assume that policy kscmp2 get deleted at a later point in time. The final seccomp profile in that case would be: allow{s1, s2, s4}, deny{s3}

Do we need to use any third-party library (such as libseccomp)?

In case of k8s, the underlying container runtime (such as docker, containerd) does the job of enforcement. The enforcement can be done only by the process that forks the workloads and in case of k8s that process is runc.

Kubearmor needs to create the json profile and set the annotations appropriately for the container runtime to use it. For VM/bare-metal workloads, we need to do the same with systemd i.e., instruct systemd to use a given seccomp profile (example here). For VM/bare-metal with containers (but without k8s), the runc/docker comes into the play again since they spawn the workload processes and are in charge of setting seccomp profile.

libseccomp is primarily needed if someone wants to attach an ebpf filter with seccomp. seccomp-ebpf filter won't be needed for whitelist/blacklist based rules. Also I see that docker profile allows us to specify "parameterized syscalls" using the json profile as an input, so that takes care of even advanced use cases. (I checked that docker/runc do not internally link to libseccomp).

TODOs

KubeArmor restart: Reload seccomp profiles
What happens when the workload moves from one node to another and that the architectures (x86, arm,...) of these two nodes differ?
Volume mount for /etc/seccomp ... The /etc/seccomp has to be created when kubearmor is deployed.
Test: Even if apparmor/selinux is not available, seccomp should still work
Prepare DevSecOps flow for managing/handling seccomp policies with auto discovery.
Handle creation of /var/lib/kubelet/seccomp/ folder during KubeArmor init time.
Check location of seccomp folder on different k8s implementations (k3s/minikube/GKE/EKS/AKS)

KubeArmor Event Auditor Design

August 10, 2021 · 11 min read

Rahul Jadhav

CNCF Ambassador | Co-founder @ Accuknox

Problem Statement we are solving here?

Kernel system calls and other event auditing are done by various tools to detect malicious behavior of a process. For e.g., if a process which is not part of a “set of processes/process spec” attempts to access a particular path using open() system call then the module will raise an alert since it doesn’t expect any process outside of a particular process spec to access that file or file system path. Event monitoring/auditing systems are also used by various compliance frameworks (such as PCI-DSS, SOC2), hardening standards (such as STIGs) and attack frameworks (such as MITRE) that provide guidelines for setting up defense rules. Falco is one such event monitoring/auditing system which uses eBPF or kernel module to filter system events at runtime in the kernel space and check for any malicious behavior based on rules passed from the user space monitor process.

Process

Consider an example scenario where as per the policy only processes invoked from /usr/bin/ folder would be able to access the /etc/ folder. The allowed process spec in this case is any process from /usr/bin/ path. Now let's assume at runtime there is a process XYZ which does not match the process spec and tries to access a file /etc/crontab. As per the above figure, following steps will happen: Process XYZ does an open() call on /etc/crontab. This results in a syscall(open) getting invoked in the kernel space. The eBPF instruction set inserted for monitoring purposes will detect the syscall(open) event. It verifies that the filter does not match, that is it finds that the process XYZ which is attempting to open the file /etc/crontab does not match the process spec. It forwards the event to the monitor process in the userspace. Event monitoring systems can take into account spatial conditions for filtering and then raise an event that can be further used for analysis purposes. The spatial condition in the above example is that when a file open is attempted, the process context is additionally checked to verify if it belongs to a process spec before raising an event. Thus, the process context (name, pid, namespace, process path) are the spatial conditions on which the open() event could be further filtered.

Quality of such monitoring/filtering/auditing systems is dependent on:

How well the filters can represent the rules as mentioned in the compliance/hardening standards?
How much performance overhead is added by the filtering system?

Problems with monitoring/filtering/auditing systems:

There are two problems with such systems

There is no option to apply conditions based on rate-limit. For e.g., generate an audit event only when a certain system event is detected more than 10 times per unit time (say 1 min).
No option to apply temporal correlation. Currently the filters operate on the context available on that event instance. Temporal correlation is not possible. For e.g., setting a filter which says if network send() syscall is invoked more than 100 times in 1 min and file read() is invoked more than 100 times per second then raises an audit event.

Problems addressed by this design:

To overcome the problems mentioned above, this idea attempts to make two major changes:

The idea allows to specify the rate-limit filters and temporal correlation filters from the userspace, but the filter is completely handled in-kernel and only the final result is emitted to user-space. This prevents any unnecessary context-switches.
The idea provides an improved schematic/design to implement the rate-limit/temporal-correlation filters such that the memory overhead and the in-kernel processing overhead is kept to the minimum.
By using policy constructs defined in this idea, a policy engine could avoid a lot of false positives in the real environment making the security engine robust.

Sample Use Cases

Sample policy for rate-limited events

apiVersion: security.accuknox.com/v1
kind: KubeArmorPolicy
metadata:
name: ksp-wordpress-config-block
namespace: wordpress-mysql
spec:
severity: 10
selector:
       matchLabels:
    app: wordpress
- process: *, -*/bash, -*/sh
  msg: "readdir limit exceeded"
  severity: 5
  - syscall: readdir
    param1: /*, -/home/*, -/var/log/*
    rate: 10p1s

Sample policy allowing temporal correlation of events

apiVersion: security.accuknox.com/v1
kind: KubeArmorPolicy
metadata:
name: ksp-wordpress-config-block
namespace: wordpress-mysql
spec:
severity: 10
selector:
       matchLabels:
    app: wordpress
- process: *, -*/bash, -*/sh
  msg: "readdir limit exceeded"
  severity: 5
  - syscall: readdir
    param1: /*, -/home/*, -/var/log/*
    rate: 10p1s

Design expectations & Limitations

Design Expectations

The design should sufficiently explain:

How will the process filter work?
1. How to ensure that least amount of overhead is incurred while handling processes which are not of interest?
2. How to ensure that the events that the policies are not interested, do not induce additional control overhead?
What eBPF bytecodes have to be loaded, both statically and dynamically?
How event parameter handling will be done? Event parameter handling must incur the least overhead.
How rate-limiting will work?

Limitations & Assumptions

Works only for systems supporting eBPF >=4.18
Different policies could induce different amounts of overhead. Thus, the use of syscalls to monitor must be properly reviewed and performance implications understood. In the future, we could have a system that can identify an approx overhead added by the policy and inform/alert the user.
This design assumes linux kernel >=4.18

Sample reference policy

apiVersion: security.accuknox.com/v1
kind: KubeArmorPolicy
metadata:
  name: detect-active-network-scanning
  namespace: multiubuntu
spec:
- process: *
  msg: "local reconn attempt with TCP scan"
  severity: 5
  - syscall: connect //FD1
    proto: *P
    ip4addr: 192.168.10.10/25 0xffffff80, 10.*.*.* 0xff0000000, 192.168.*.1 0xffff00ff
    rate: 20p1s

  - syscall: connect //FD2
    proto: FILE
    path: /tmp/*
    rate: 20p1s

- process: *, /bin/*sh, -*ssh
  msg: "consecutive RAW sends"
  severity: 5
  - syscall:raw_sendto
    param2: 192.168.*.*, 10.*.*.*
    rate: 20p1s

- process: *, /bin/*sh, -*ssh
  msg: "consecutive RAW sends"
  severity: 5
  - syscall: raw_send
    param2: 192.168.*.*, 10.*.*.*
    rate: 20p1s

- process: *
  msg: “outbound probes detected”
  severity:
  - kprobe: tcp_rst
    Rate: 10p1s

- process: *
  msg: “inbound probes detected”
  severity:
  - kprobe: tcp_rst_send
    Rate: 10p1s

Note: Not every event might be associated with a process spec. There are events that are generated which may not have any associated task structure.

What design constraints do we have to live with?

Example the # of bpf programs, the instructions, memory…

Module Design

Handling of events

On New Policy

When a new policy is provided as an input the policy might be either a

Container based policy
Host based policy

In either case, a new entry would be added in the process_spec_table containing the pid-ns of the container. In case of host-based policy the pid-ns would be 0.

process-spec-table

Container pid-ns	process-spec	event-filter-spec
12345	*	[event1-fd1, event2-fd2, …]
53678	/usr/bin/*sh	[event3-fd3, ...]
12312	, -/*sh	[event4-fd4, event5-fd5, …]
5235	[NA]	...
0 (host-based)	...	...

Points to note:

There could be several event-filter-specs for the same [pid-ns, process-spec] tuple.
0 pid-ns indicates host-based rules
The event-filter-spec contains eBPF bytecode that is compiled on demand. The event-filter-spec has the event type/info for which the corresponding event/kprobe/tracepoint would be loaded.
Every event-filter-spec’s compiled bytecode is pre-loaded in the BPF_MAP_TYPE_PROG_ARRAY for tail-call processing and file-descriptor noted in the event-filter-spec column.

process spec

On New Process

The process-filter-table is a bpf map that stores the mapping of {pid-ns, pid, event-id} to the corresponding set of { event-filter-fds }.

process-filter-table

Pid-ns, pid, Event-ID	Event-filter-FD	Opaque Data
{ 0xcafebabe, 0xdeadface, SYSCALL-CONNECT}	[FD1, FD2]	[...event-handler can keep rate-info and other event specific data…]

[TODO]: The process wildcard matching has to be done in the kernel space. Write a prototype code to validate the wildcard matching can be implemented effectively in kernel space.

Pid-ns, pid, Event-ID
Input: event_info_t (check next section for details)
1. Check the process-spec-table and check if the container-pid-ns matches. a. If there is no match, ignore the new process event. 2. If there is a match, add a new entry into the process-filter-table. 3. Note the event-filter-fd-map has to be populated.

New Process

On Kernel Event

Event Structure

event-info-structure

Note that this is not a bpf-map. This is an internal data-structure used to pass between tail-calls.

struct event_info {
    uint32_t id;                       // updated by kernel-event bytecode
    uint32_t fdset[MAX_FD_PER_EVENT];  // updated by matchProcess bytecode
    void *context;                     // updated by kernel-event bytecode
} event_info_t;

where…
   id is the event-id … such as SYSCALL-CONNECT, KPROBE-TCP_RST
   fdset is the set of event handlers for the given kernel event
   context is the kernel context available for the kernel event

onKernelEvent pseudo-code
1. A kernel event of interest (i.e., one which is enabled based on policy-event-filter) is called. Note that an event handler bytecode for a kernel event is inserted only if there exists a corresponding policy that operates on that kernel event. 2. The primary task of kernel_event_bytecode is to create an event_info { event_id, context } and then call the matchProcess bytecode. 3. The matchProcess matches the process. Once the process-filter-table entry is identified, the logic gets a list of tail-call FDs to call. The list of FDs are called one after another in the same sequence in which they appear in the policy spec. 4. The tail-call FDs are called one after another based on the FD set. 5. The event-handler might want to update the runtime state in the opaque-data of the process-filter-table.

onKernelEvent pseudo-code

1. A kernel event of interest (i.e., one which is enabled based on policy-event-filter) is called. Note that an event handler bytecode for a kernel event is inserted only if there exists a corresponding policy that operates on that kernel event.
2. The primary task of kernel_event_bytecode is to create an event_info { event_id, context } and then call the matchProcess bytecode.
3. The matchProcess matches the process. Once the process-filter-table entry is identified, the logic gets a list of tail-call FDs to call. The list of FDs are called one after another in the same sequence in which they appear in the policy spec.
4. The tail-call FDs are called one after another based on the FD set.
5. The event-handler might want to update the runtime state in the opaque-data of the process-filter-table.

Notes:

It is possible that we receive a kernel event that does not have an associated process. For e.g., kprobe:tcp_rcv_reset. Such events could only be added for host-based audit rules.
Note that New process event from the kernel needs a special handler, because it needs to fill the process-filter-table and might have to process the event-filters.[TODO].

onKernelEvent

Overall Event Processing Logic

On Process Terminate

[TODO] cleanup process-filter-table

On Policy Delete

Handle update of process-spec-table. This may lead to removal of loaded event-filter-spec ebpf bytecode and deletion of corresponding descriptors.

On delete container

Remove entry from the process-spec-table

Handling Rate-limit

Problem with handling rate-limit

Problem with handling

Consider the case where an event is to be observed with a rate of 10 per one second. The Period here is 1 sec. The dotted box shown in the figure above shows 1 second time period. The circles on the timeline show the occurrence of the events.

Approach 1: Fine-grained approach

This approach allows one to calculate the precise rate-limit but requires more memory to be maintained since every event observed in the time quantum has to be stored. There is also more processing time required because of the store and cycle operations.

Approach 2: Coarse-grained approach

This approach reduces the memory requirement by using adjoining time quantums but this may result in some cases that the rate-limits are not observed.

Approach Preference

Approach 2 results in much less memory and processing overhead. Also consider that in real-world cases, we do not expect the user to specify the exact rate i.e., user will in general provide a lower limit for the rate. For example, for the active-scanning policy scenario depicted in this document, the rate-limit of 10p1s is depicted but in reality the scanning speed will be much faster i.e., Approach 2 should easily be able to detect the rate.

Performance considerations

If an event is not attached by any policy then there should not be any runtime overhead associated with that event handling.
Minimum runtime overhead if an event is attached but the process is not of interest. We need to matchProcess and discard it. This will currently result in one map lookup and one tail-call before the event is discarded. It may be possible to remove the tail-call but will add additional memory requirements since the handleEvent() and matchProcess() has to be bundled together.

Tasklist

Prototype code: eBPF bytecode to match process wildcard pattern [OPTION1]
Prototype code: Auto generate event-filter bytecode. Merge multiple event-filter bytecodes into single code.
Prototype code: Handling tail call and corresponding argument call. For a detailed tasklist check ref.

How USBs are a Physical Attack Vector​

The Solution - The KubeArmor USB Device Handler​

How It Works Under the Hood​

Configuration and Setting Up the USB Device Handler​

Monitoring Device Events​

Policy in Action - Use Cases​

Policy In Action - Exact Match, Match All, Allow Based Examples​

Conclusion​

Security Challenges in Confidential Environments​

Vault Security​

Key Threat Models and Risks​

The Need for Multi-Layered Security​

Integration Architecture​

Key Components​

Configuration Examples​

kubearmor.path​

kubearmor.service​

Policy Enforcement Mechanisms​

PoC Demo Summary​

Key Capabilities​

Policy Examples​

Network Policy​

File Integrity Policy​

Security Considerations​

Setup Guide: KubeArmor with Confidential Containers​

Prerequisites​

1. Build an eBPF-Enabled Kata Kernel​

2. Prepare Kata VM Image for KubeArmor​

3. Update Presets​

4. Add Prestart Hook​

5. Place Policies​

6. Deploy & Test Policy Enforcement​

Challenges and Future Improvements​

Acknowledgements​

Why OCI Hooks Matter in Modern Cloud Workloads​

What are OCI Hooks?​

Lifecycle of a Container with OCI Hooks​

Why KubeArmor Introduced OCI Hooks Support​

How KubeArmor Integrates with OCI Hooks​

Use Cases Enabled by OCI Hooks​

Roadmap: What’s Next for OCI Hooks in KubeArmor​

References & Resources​

DACs vs. MACs​

LSM Hooks​

TOCTOU problem handling​

Security Modules currently defined in Linux kernel​

Stackable vs Non-Stackable LSMs​

Permissive hooks in LSMs​

Security Modules: A general critique​

References​

Benchmarking data​

Config​

Without Kubearmor​

Average​

Kubearmor with discovered Policy Applied​

Average​

BPF LSM benchmarking data​

Introduction​

Supporting KubeArmor on Oracle Linux​

To Summarize​

KubeArmor annotation controller​

What is an admission controller?​

What are the benefits of the annotation controller?​

What if the controller fails?​

How can I install it?​

What are the Kubernetes versions that can support the new controller?​

References​

KubeArmor BPF LSM Integration​

High Level Module Changes​

Module Design​

Map Design​

Outer Map details​

Inner Map details​

Handling of Events​

Deeper Dive with Examples​

Directory Matching​

Hashing​

TODO/ToCheck​

Miscellaneous Notes​

Why❓​

How USBs are a Physical Attack Vector

The Solution - The KubeArmor USB Device Handler

How It Works Under the Hood

Configuration and Setting Up the USB Device Handler

Monitoring Device Events

Policy in Action - Use Cases

Policy In Action - Exact Match, Match All, Allow Based Examples

Conclusion

Security Challenges in Confidential Environments

Vault Security

Key Threat Models and Risks

The Need for Multi-Layered Security

Integration Architecture

Key Components

Configuration Examples

`kubearmor.path`

`kubearmor.service`

Policy Enforcement Mechanisms

PoC Demo Summary

Key Capabilities

Policy Examples

Network Policy

File Integrity Policy

Security Considerations

Setup Guide: KubeArmor with Confidential Containers

Prerequisites

1. Build an eBPF-Enabled Kata Kernel

2. Prepare Kata VM Image for KubeArmor

3. Update Presets

4. Add Prestart Hook

5. Place Policies

6. Deploy & Test Policy Enforcement

Challenges and Future Improvements

Acknowledgements

Why OCI Hooks Matter in Modern Cloud Workloads

What are OCI Hooks?

Lifecycle of a Container with OCI Hooks

Why KubeArmor Introduced OCI Hooks Support

How KubeArmor Integrates with OCI Hooks

Use Cases Enabled by OCI Hooks

Roadmap: What’s Next for OCI Hooks in KubeArmor

References & Resources

DACs vs. MACs

LSM Hooks

TOCTOU problem handling

Security Modules currently defined in Linux kernel

Stackable vs Non-Stackable LSMs

Permissive hooks in LSMs

Security Modules: A general critique

References

Benchmarking data

Config

Without Kubearmor

Average

Kubearmor with discovered Policy Applied

Average

BPF LSM benchmarking data

Introduction

Supporting KubeArmor on Oracle Linux

To Summarize

KubeArmor annotation controller

What is an admission controller?

What are the benefits of the annotation controller?

What if the controller fails?

How can I install it?

What are the Kubernetes versions that can support the new controller?

References

KubeArmor BPF LSM Integration

High Level Module Changes

Module Design

Map Design

Outer Map details

Inner Map details

Handling of Events

Deeper Dive with Examples

Directory Matching

Hashing

TODO/ToCheck

Miscellaneous Notes

Why❓