Process and socket auditing with osquery

Enabling these auditing features requires additional configuration of osquery. osquery can leverage either BPF, Audit, OpenBSM or EndpointSecurity subsystems to record process executions and network connections in near real-time on Linux and macOS systems. Although these auditing features are extremely powerful for recording the activity from a host, they may introduce additional CPU overhead and greatly increase the number of log events generated by osquery.

To read more about how event-based tables are created and designed, check out the osquery Table Pubsub Framework.

Because different platforms have different choices for collecting real-time event data, osquery has multiple tables to present this information depending on the source and platform:

Event type	osquery Table	Source	Supported Platform
Process events	`process_events`	Audit (Linux), OpenBSM (macOS)	Linux, macOS (10.15 and older)
Process events	`bpf_process_events`	BPF	Linux (kernel 4.18 and newer)
Process events	`es_process_events`	EndpointSecurity	macOS (10.15 and newer)
Socket events	`socket_events`	Audit (Linux), OpenBSM (macOS)	Linux, macOS (10.15 and older)
Socket events	`bpf_socket_events`	BPF	Linux (kernel 4.18 and newer)

To collect process events, you would add a query like the following to your query schedule, or to a query pack:

SELECT * FROM process_events;

Each of these auditing features is enabled on a per-source basis using additional osquery configuration settings. Enabling any of them may have performance impact depending on the host activity, and should be tested in your environment before deployment. See the OS-specific sections for guidance.

General Troubleshooting

Though some testing of underlying operating system configuration can be performed via osqueryi; osqueryi and osqueryd operate independently and do not communicate.

The --verbose flag can be really useful when trying to debug a problem.

Examine configuration flags

To verify that osquery's flags are set correct, you can query the osquery_flags table. For example, on a macOS machine, this shows osquery will process OpenBSM events.

osquery> select * from osquery_flags where name in ("disable_events", "disable_audit");
+----------------+------+---------------------------------------------------+---------------+-------+------------+
| name           | type | description                                       | default_value | value | shell_only |
+----------------+------+---------------------------------------------------+---------------+-------+------------+
| disable_audit  | bool | Disable receiving events from the audit subsystem | true          | false | 0          |
| disable_events | bool | Disable osquery publish/subscribe system          | false         | false | 0          |
+----------------+------+---------------------------------------------------+---------------+-------+------------+

Examine event table

osquery keeps state about the events subsystem in the osquery_events table. The events column is of note here.

This example is from a macOS machine with events enabled, but no events. You should try triggering an event, and then confirming that the event count is non-0. If it remains at zero, the problem is likely in how the OS auditing side is configured. See the platform specific instructions.

osquery> select * from osquery_events;
+-------------------------+-----------------+------------+---------------+--------+-----------+--------+
| name                    | publisher       | type       | subscriptions | events | refreshes | active |
+-------------------------+-----------------+------------+---------------+--------+-----------+--------+
| diskarbitration         | diskarbitration | publisher  | 1             | 0      | 0         | 1      |
| event_tapping           | event_tapping   | publisher  | 1             | 0      | 0         | 0      |
| fsevents                | fsevents        | publisher  | 0             | 0      | 24        | 1      |
| iokit                   | iokit           | publisher  | 1             | 0      | 0         | 1      |
| openbsm                 | openbsm         | publisher  | 9             | 0      | 0         | 0      |
| scnetwork               | scnetwork       | publisher  | 0             | 0      | 0         | 0      |
| disk_events             | diskarbitration | subscriber | 1             | 0      | 0         | 1      |
| file_events             | fsevents        | subscriber | 0             | 0      | 0         | 1      |
| hardware_events         | iokit           | subscriber | 1             | 0      | 0         | 1      |
| process_events          | openbsm         | subscriber | 8             | 0      | 0         | 1      |
| user_events             | openbsm         | subscriber | 1             | 0      | 0         | 1      |
| user_interaction_events | event_tapping   | subscriber | 1             | 0      | 0         | 1      |
| yara_events             | fsevents        | subscriber | 0             | 0      | 0         | 1      |
+-------------------------+-----------------+------------+---------------+--------+-----------+--------+

Linux process auditing using Audit

On Linux, osquery can the Audit system to collect and process events. It accomplishes this by monitoring syscalls such as execve() and execveat(). auditd should not be running when using osquery's process auditing, as it will conflict with osqueryd over access to the audit netlink socket. You should also ensure auditd is not configured to start at boot.

The only prerequisite for using osquery's auditing functionality on Linux is that you must use a kernel version that contains the Audit functionality. Most kernels over version 2.6 have this capability.

There is no requirement to install auditd or libaudit. Osquery only uses the audit features that exist in the kernel.

A sample log entry from process_events may look something like this:

{
  "action": "added",
  "columns": {
    "uid": "0",
    "time": "1527895541",
    "pid": "30219",
    "path": "/usr/bin/curl",
    "auid": "1000",
    "cmdline": "curl google.com",
    "ctime": "1503452096",
    "cwd": "",
    "egid": "0",
    "euid": "0",
    "gid": "0",
    "parent": ""
  },
  "unixTime": 1527895550,
  "hostIdentifier": "vagrant",
  "name": "process_events",
  "numerics": false
}

To better understand how this works, let's walk through 4 configuration options. These flags can be set at the command line or placed into the osquery.flags file.

--disable_audit=false by default this is set to true and prevents osquery from opening the kernel audit's netlink socket. By setting it to false, we are telling osquery that we want to enable auditing functionality.
--audit_allow_config=true by default this is set to false and prevents osquery from making changes to the audit configuration settings. These changes include adding/removing rules, setting the global enable flags, and adjusting performance and rate parameters. Unless you plan to set all of those things manually, you should leave this as true. If you are configuring audit, using a control binary, or /etc/audit.conf, your osquery may override your settings.
--audit_persist=true but default this is true and instructs osquery to 'regain' the audit netlink socket if another process also accesses it. However, you should do your best to ensure there will be no other program running which is attempting to access the audit netlink socket.
--audit_allow_process_events=true this flag indicates that you would like to record process events

Linux socket auditing using Audit

Osquery can also be used to record network connections by enabling socket_events. This table uses the syscalls bind() and connect() to gather information about network connections. This table is not automatically enabled when process_events are enabled because it can introduce considerable load on the system.

To enable socket events, use the --audit_allow_sockets flag.

A sample socket_event log entry looks like this:

{
  "action": "added",
  "columns": {
    "time": "1527895541",
    "status": "succeeded",
    "remote_port": "80",
    "action": "connect",
    "auid": "1000",
    "family": "2",
    "local_address": "",
    "local_port": "0",
    "path": "/usr/bin/curl",
    "pid": "30220",
    "remote_address": "172.217.164.110"
  },
  "unixTime": 1527895545,
  "hostIdentifier": "vagrant",
  "name": "socket_events",
  "numerics": false
}

If you would like to log UNIX domain sockets use the hidden flag: --audit_allow_unix. This will put considerable strain on the system as many default actions use domain sockets. You will also need to explicitly select the socket column from the socket_events table.

The behavior of the socket_events table can be changed with the following boolean flags:

Flag	Description
--audit_allow_sockets	Allow the audit publisher to install socket-related rules
--audit_allow_unix	Allow socket events to collect domain sockets
--audit_allow_failed_socket_events	Include rows for socket events that have failed
--audit_allow_accept_socket_events	Include rows for accept socket events
--audit_allow_null_accept_socket_events	Allow non-blocking accept() syscalls that returned EAGAIN/EWOULDBLOCK

Troubleshooting Audit-based process and socket auditing on Linux

There are a few different methods to ensure you have configured auditing correctly.

Ensure you are supplied all of the necessary flags mentioned above in either a command-line argument or in your flagfile.
Verify auditd is not running, if it is installed on the system.
Run auditctl -s if the binary is present on your system and verify that enable is not set to zero and the pid corresponds to a process for osquery
Verify that your osquery configuration has a query to SELECT from the process_events and/or socket_events tables
You may also run auditing using osqueryi as root:

osqueryi --audit_allow_config=true --audit_allow_sockets=true --audit_persist=true --disable_audit=false --events_expiry=1 --events_max=50000 --logger_plugin=filesystem  --disable_events=false

If you would like to debug the raw audit events as osqueryd sees them, use the hidden flag --audit_debug. This will print all of the RAW audit lines to osquery's stdout.

NOTICE: Linux systems running journald will collect logging data originating from the kernel audit subsystem (something that osquery enables) from several sources, including audit records. To avoid performance problems on busy boxes (specially when osquery event tables are enabled), it is recommended to mask audit logs from entering the journal with the following command systemctl mask --now systemd-journald-audit.socket.

Avoid throttling, losing events and interpreting Audit publisher throttling messages

If osquery is CPU constrained and is processing a high enough stream of events, you may receive this warning message:
The Audit publisher has throttled reading records from Netlink for <N> seconds. Some events may have been lost..

This message can only appear at most every minute and it indicates that the Audit publisher had to slow down reading records from the Netlink socket for the reported duration, since the previous throttling message. This happens when osquery is not processing records fast enough to prevent its internal buffers growing too much and consuming too much memory.

Throttling may cause loss of events, since the Audit subsystem backlog buffer could fill up; if that happens the kernel will be forced to drop some of them.
You can check if this is happening looking at the lost field via auditctl -s.

Throttling currently starts when more than 4096 records have been read and are still in the queue to be processed by osquery; this is a number of records which can support high spikes of events, and is a limit for osquery to avoid consuming memory indefinitely.
Keep in mind that if the high rate of events continues, even with throttling happening, you might still have to increase your default watchdog memory limit or reduce the interval of the scheduled query on the evented table, due to the amount of rows that it will have to generate at once.

There's also a second throttling point in the Audit publisher pipeline, which exists after the records have been read from the Netlink socket and are then parsed into a more computer friendly format.
When throttling happens here, another message will be logged which is:
The Audit publisher has throttled record processing for <N> seconds. This may cause further throttling and loss of events..

This message exists mostly for debugging purposes and will only appear if --verbose is active, because this doesn't necessarily cause loss of events: a bottleneck in this point of the pipeline will have to cause throttling in the Netlink socket reading side, before possibly causing loss of events.
So as long as no throttling is happening on the reading side, no loss of events should happen due to this.

To avoid throttling there isn't much to be done beyond reducing constraints on the CPU or in general have osquery process less events.

To attempt avoiding losing events, first of all we should ensure that throttling happens as few times as possible. Then when can try to increase the backlog buffer that the Audit subsystem is using via the --audit_backlog_limit flag, to attempt to support bigger/slightly longer events spikes.
Keep in mind that increasing this will increase the amount of memory used by the Audit subsystem and that this memory is not allocated by osquery, so it won't be accounted for by the watchdog.

User event auditing with Audit

On Linux, a companion table called user_events is included that provides several authentication-based events. If you are enabling process auditing it should be trivial to also include this table.

Linux process and socket auditing using BPF

When osquery is running on a recent kernel (>= 4.18), the BPF eventing framework can be used. This event publisher needs to monitor for more system calls to reach feature parity with the Audit-based tables. For this reason, enabling BPF will also enable both the bpf_process_events and bpf_socket_events tables.

In order to start the publisher and enable the subscribers, the following flags must be passed: --disable_events=false --enable_bpf_events=true. The --verbose flag can also be extremely useful when setting up the configuration for the first time, since it emit more debug information when something fails.

The BPF framework will make use of a perf event array and several per-cpu maps in order to receive events and correctly capture strings and buffers. These structures can be configured using the following command line flags:

bpf_perf_event_array_exp: size of the perf event array, as a power of two
bpf_buffer_storage_size: how many slots of 4096 bytes should be available in each memory pool

Memory usage depends on both:

How many processors are currently online
How many processors can be added by hotswapping

The BPF event publisher uses 6 memory pools, grouping system calls in order to evenly distribute memory usage. Not counting the internal maps used to merge sys_enter/sys_exit events (the size for these maps is rather small), memory usage can be easily estimated with the following formula:

buffer_storage_bytes = memory_pool_count * (bpf_buffer_storage_size * 4096) * possible_cpu_count

perf_bytes = (2 ^ bpf_perf_event_array_exp) * online_cpu_count

The cpu count numbers can be read from the /sys folder:

possible_cpu_count: /sys/devices/system/cpu/possible
online_cpu_count: /sys/devices/system/cpu/online

VMware Fusion (and possibly other systems as well) supports CPU hotswapping, raising the possible_cpu_count to 128. This causes a huge increase in memory usage, and it is for this reason that the default settings are rather low.

This problem can be easily fixed by disabling hotswapping. This setting is unfortunately not available through the user interface, so it needs to be changed directly in the .vmx file (vcpu.hotadd=FALSE).

macOS process & socket auditing

Auditing processes with OpenBSM

To enable OpenBSM-based process auditing in osquery, set the following command-line flags:

--disable_audit=false
--disable_events=false
--audit_allow_config

Note:: macOS systems 10.15 and earlier ship with the OpenBSM subsystem enabled, but the default settings do not audit process execution or the root user. The osquery command-line flag --audit_allow_config will make run-time configuration changes to your system audit to enable these features. This is all you need to get up and running.

Alternatively, instead of using the --audit_allow_config flag, you may edit the audit_control file in /etc/security/ for more granular/nuanced needs. This is optional and considered an "advanced configuration". An example configuration is provided below, but the important flags are: ex, pc, argv, and arge. The ex flag will log exec events while pc logs exec, fork, and exit. If you don't need fork and exit you may leave that flag out however in the future, getting parent pid may require fork. If you care about getting the arguments and environment variables you also need argv and arge. More about these flags can be found here. Note that it might require a reboot of the system for these new flags to take effect. audit -s should restart the system but your mileage may vary.

Note: Prior to macOS 10.15, OpenBSM was the primary source of real-time audit events. Since macOS 10.15, EndpointSecurity has been available as a newer alternative and eventual replacement to the now-deprecated OpenBSM. However, with osquery, you can collect events from either of these sources.

Auditing processes with EndpointSecurity

To enable EndpointSecurity in osquery, set --disable_endpointsecurity=false in the configuration.

EndpointSecurity is already enabled in the OS on all macOS hosts beginning with macOS 10.15, and needs no special configuration. There are however some additional steps to permit osquery to collect events.

For osquery to capture events in its es_process_events table, it must have the Full Disk Access (FDA) permission enabled in macOS Privacy & Security settings. Without this permission, osquery will run as normal, but the table will always be empty. Note: If osquery is already running without the permission, it must be restarted after you have granted the permission.

If osquery is not granted the FDA permission, it will not prompt the user to grant it. It will just issue a warning (when running with --verbose), and the es_process_events table will simply be empty when queried.

Full Disk Access

The FDA permission (or lack thereof) is inherited from Terminal.app when running osquery interactively, but is not inherited from launchctl when running as a service (including when started using the osqueryctl helper script).

Parent Process	Steps Taken Before Launching osquery	Querying `es_process_events`
`Terminal.app`¹	Give Full Disk Access to `Terminal.app` only	Success
`Terminal.app`¹	Give FDA only to osquery only, or do nothing	No events
`launchctl`	Give Full Disk Access to `/opt/osquery/lib/osquery.app/Contents/MacOS/osqueryd`² only	Success
`launchctl`	Give FDA to `launchctl` only, or do nothing	No events

¹ : if you use a third-party terminal emulator like iTerm.app, grant that the permission instead of Terminal.app.

² : whether running osquery via osqueryi, osqueryd, or osqueryd -S, the permissions will be the same in each case.

Manually Granting Permissions

To manually enable FDA permissions for an executable: open System Preferences, go to Security & Privacy, select the Privacy tab, and find Full Disk Access item on the left side. Unlock the System Preferences pane (lower left side lock icon) and enter your credentials. On the right side, clicking the + icon adds a new entry to the list, and you can select the executable to be granted this permission. The Finder-based file browser doesn't see paths like /usr by default, but you can either drag-and-drop the executable from another Finder window, or you can begin typing with / and enter the path explicitly. Note: the executable must already exist at that path before it can be manually granted the permission this way.

Automatically Granting Permissions (Silent Installs)

If a macOS host is enrolled in MDM, The FDA permissions can be granted silently by pushing a "PPPC payload" configuration profile (Privacy Preferences Policy Control) that sets the SystemPolicyAllFiles (i.e., the FDA) key. A PPPC payload silently sets permissions, provided with an executable identifier called the CodeRequirement.

To get the appropriate CodeRequirement identifier, use the codesign tool and then copy everything in the output after the designated =>.

> codesign  -dr - /opt/osquery/lib/osquery.app/Contents/MacOS/osqueryd
Executable=/opt/osquery/lib/osquery.app/Contents/MacOS/osqueryd
designated => identifier "io.osquery.agent" and anchor apple generic and certificate 1[field.1.2.840.113635.100.6.2.6] /* exists */ and certificate leaf[field.1.2.840.113635.100.6.1.13] /* exists */ and certificate leaf[subject.OU] = "3522FA9PXF"

For your deployment, either generate an equivalent profile using your MDM dashboard (specifying /usr/local/bin/osqueryd as Identifier and path as the Identifier Type and setting SystemPolicyAllFiles to Allow), or just use the example configuration profile below, ensuring the correct value for the following fields:

PayloadOrganization (your organization)
CodeRequirement (see above)

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
 <key>PayloadContent</key>
 <array>
  <dict>
   <key>PayloadDescription</key>
   <string>osqueryd</string>
   <key>PayloadDisplayName</key>
   <string>osqueryd</string>
   <key>PayloadIdentifier</key>
   <string>BDBD19F2-A35A-4AEC-9E96-3CA7E2994666</string>
   <key>PayloadOrganization</key>
   <string>Trail of Bits</string>
   <key>PayloadType</key>
   <string>com.apple.TCC.configuration-profile-policy</string>
   <key>PayloadUUID</key>
   <string>89121197-3B5F-4502-BB8C-4331261D3B8C</string>
   <key>PayloadVersion</key>
   <integer>1</integer>
   <key>Services</key>
   <dict>
    <key>SystemPolicyAllFiles</key>
    <array>
     <dict>
      <key>Allowed</key>
      <true/>
      <key>CodeRequirement</key>
      <string>identifier "io.osquery.agent" and anchor apple generic and certificate 1[field.1.2.840.113635.100.6.2.6] /* exists */ and certificate leaf[field.1.2.840.113635.100.6.1.13] /* exists */ and certificate leaf[subject.OU] = "3522FA9PXF"</string>
      <key>Comment</key>
      <string></string>
      <key>Identifier</key>
      <string>io.osquery.agent</string>
      <key>IdentifierType</key>
      <string>bundleID</string>
     </dict>
    </array>
   </dict>
  </dict>
 </array>
 <key>PayloadDescription</key>
 <string>osqueryd</string>
 <key>PayloadDisplayName</key>
 <string>osqueryd</string>
 <key>PayloadIdentifier</key>
 <string>BDBD19F2-A35A-4AEC-9E96-3CA7E2994666</string>
 <key>PayloadOrganization</key>
 <string>Trail of Bits</string>
 <key>PayloadScope</key>
 <string>System</string>
 <key>PayloadType</key>
 <string>Configuration</string>
 <key>PayloadUUID</key>
 <string>28A8A2B7-A91E-4C26-BAEC-00F6F542742E</string>
 <key>PayloadVersion</key>
 <integer>1</integer>
</dict>
</plist>

Auditing processes and sockets with OpenBSM

To enable OpenBSM in osquery, set --disable_audit=false in the configuration.

OpenBSM is already enabled in the OS on all macOS installations, but with its default settings it doesn't audit process execution or the root user. To start process auditing on macOS, edit the audit_control file in /etc/security/. An example configuration is provided below, but the important flags are: ex, pc, argv, and arge. The ex flag will log exec events, while pc logs exec, fork, and exit. If you don't need fork and exit you may leave that flag out. However, in the future, getting the parent pid may require fork. If you care about getting the arguments and environment variables, you also need argv and arge. More about these flags can be found here. Note that it might require a reboot of the system for these new flags to take effect. audit -s should restart the system, but your mileage may vary.

#
# $P4: //depot/projects/trustedbsd/openbsm/etc/audit_control#8 $
#
dir:/var/audit
flags:ex,pc,ap,aa,lo,nt
minfree:5
naflags:no
policy:cnt,argv,arge
filesz:2M
expire-after:10M
superuser-set-sflags-mask:has_authenticated,has_console_access
superuser-clear-sflags-mask:has_authenticated,has_console_access
member-set-sflags-mask:
member-clear-sflags-mask:has_authenticated

osquery events optimization

This section provides a brief overview of common and recommended optimizations for event-based tables. These optimizations also apply to the FIM events.

--events_optimize=true apply optimizations when SELECTing from events-based tables, enabled by default.
--events_expiry the lifetime of buffered events in seconds with a default value of 86000.
--events_max the maximum number of events to store in the buffer before expiring them with a default value of 1000.

The goal of optimizations are to protect the running process and system from impacting performance. By default these are all enabled, which is good for configuration and performance, but may introduce inconsistencies on highly-stressed systems using process auditing.

Optimizations work best when SELECTing often from event-based tables. Otherwise the events are in a buffered state. When an event-based table is selected within the daemon, the backing storage maintaining event data is cleared according to the --event_expiry lifetime. Setting this value to 1 will auto-clear events whenever a SELECT is performed against the table, reducing all impact of the buffer.