The osquery daemon uses a default filesystem logger plugin. Like the config, output from the filesystem plugin is written as JSON. Results from the query schedule are written to /var/log/osquery/osqueryd.results.log.
There are two types of logs:
- Status logs (info, warning, error, and fatal)
- Query schedule results logs, including logs from snapshot queries
If you run osqueryd in a verbose mode then peek at /var/log/osquery/:
$ ls -l /var/log/osquery/
total 24
lrwxr-xr-x 1 root wheel 77 Sep 30 17:37 osqueryd.INFO -> osqueryd.INFO.20140930
-rw------- 1 root wheel 1226 Sep 30 17:37 osqueryd.INFO.20140930
-rw------- 1 root wheel 388 Sep 30 17:37 osqueryd.results.log
Logger Plugins
osquery includes logger plugins that support configurable logging to a variety of interfaces. The built in logger plugins are filesystem (default), tls and syslog. Multiple logger plugins may be used simultaneously, effectively copying logs to each interface. To enable multiple loggers set the --logger_plugin
option to a comma separated list of the requested plugins.
For information on configuring logger plugins, see logging/results flags. Developing new logger plugins is explored in the development docs.
Status logs
Status logs are generated by the Glog logging framework. The default filesystem logger plugin writes these logs to disk the same way Glog would. Logger plugins may intercept these status logs and write them to system or otherwise.
As the above directory listing reveals, osqueryd.INFO is a symlink to the most recent execution's INFO log. The same is true for the WARNING, ERROR and FATAL logs. For more information on the format of Glog logs, please refer to the Glog documentation.
Results logs
Differential logs
The results of your scheduled queries are logged to the "results log". These are differential changes between the last (most recent) query execution and the current execution. Each log line is a JSON string that indicates what data has been added/removed by which query. There are two format options, single, or event, and batched. Some queries do not make sense to log "removed" events like:
SELECT i.*, p.resident_size, p.user_time, p.system_time, t.minutes as c
FROM osquery_info i, processes p, time t
WHERE p.pid = i.pid;
By adding an outer join of time
and using time.minutes
as a counter this query will always log a single "added" and a single "removed" line. The purpose is to create a continuous monitor of osquery's performance. For these cases add a "removed": false
to the scheduled query.
{
"schedule": {
"osquery_monitor": {
"query": "SELECT ... t.minutes as c FROM time t WHERE ...",
"interval": 60,
"removed": false
}
}
}
Snapshot logs
Snapshot logs are an alternate form of query result logging. A snapshot is an 'exact point in time' set of results, no differentials. If you always want a list of mounts, not the added and removed mounts, use a snapshot. In the mounts case, where differential results are seldom emitted (assuming hosts do not often mount and unmount), a complete snapshot will log after every query execution. This will be a lot of data amortized across your fleet.
Data snapshots may generate a large amount of output. For log collection safety, output is written to a dedicated sink. The filesystem logger plugin writes snapshot results to /var/log/osquery/osqueryd.snapshots.log.
To schedule a snapshot query, use:
{
"schedule": {
"mounts": {
"query": "select * from mounts",
"interval": 3600,
"snapshot": true
}
}
}
Schedule results
Event format
Event is the default result format. Each log line represents a state change. This format works best for log aggregation systems like Logstash or Splunk.
Example output of SELECT name, path, pid FROM processes;
(whitespace added for readability):
{
"action": "added",
"columns": {
"name": "osqueryd",
"path": "/usr/local/bin/osqueryd",
"pid": "97830"
},
"name": "processes",
"hostname": "hostname.local",
"calendarTime": "Tue Sep 30 17:37:30 2014",
"unixTime": "1412123850"
}
{
"action": "removed",
"columns": {
"name": "osqueryd",
"path": "/usr/local/bin/osqueryd",
"pid": "97650"
},
"name": "processes",
"hostname": "hostname.local",
"calendarTime": "Tue Sep 30 17:37:30 2014",
"unixTime": "1412123850"
}
This tells us that a binary called "osqueryd" was stopped and a new binary with the same name was started (note the different pids). The data is generated by keeping a cache of previous query results and only logging when the cache changes. If no new processes are started or stopped, the query won't log any results.
Snapshot format
Snapshot queries attempt to mimic the differential event format, instead of emitting "columns", the snapshot data is stored using "snapshot". An action is included as, you guessed it, "snapshot"!
Consider the following example:
{
"action": "snapshot",
"snapshot": [
{
"parent": "0",
"path": "/sbin/launchd",
"pid": "1"
},
{
"parent": "1",
"path": "/usr/sbin/syslogd",
"pid": "51"
},
{
"parent": "1",
"path": "/usr/libexec/UserEventAgent",
"pid": "52"
},
{
"parent": "1",
"path": "/usr/libexec/kextd",
"pid": "54"
},
],
"name": "process_snapshot",
"hostIdentifier": "hostname.local",
"calendarTime": "Mon May 2 22:27:32 2016 UTC",
"unixTime": "1462228052"
},
Batch format
If a query identifies multiple state changes, the batched format will include all results in a single log line. If you're programmatically parsing lines and loading them into a backend datastore, this is probably the best solution.
To enable batch log lines, launch osqueryd with the --log_result_events=false
argument.
Example output of SELECT name, path, pid FROM processes;
(whitespace added for readability):
{
"diffResults": {
"added": [
{
"name": "osqueryd",
"path": "/usr/local/bin/osqueryd",
"pid": "97830"
}
],
"removed": [
{
"name": "osqueryd",
"path": "/usr/local/bin/osqueryd",
"pid": "97650"
}
]
},
"name": "processes",
"hostname": "hostname.local",
"calendarTime": "Tue Sep 30 17:37:30 2014",
"unixTime": "1412123850"
}
Most of the time the Event format is the most appropriate. The next section in the deployment guide describes log aggregation methods. The aggregation methods describe collecting, searching, and alerting on the results from a query schedule.
Unique host identification
If you need a way to uniquely identify hosts embedded into osqueryd's results log, then the --host_identifier
flag is what you're looking for.
By default, host_identifier is set to "hostname". The host's hostname will be used as the host identifier in results logs. If hostnames are not unique or consistent in your environment, you can launch osqueryd with --host_identifier=uuid
.
On Linux, a new UUID will be generated and stored in RocksDB so that it persists across reboots. On OS X, this will attempt to use the hardware UUID and fail back to using a custom generated UUID if that fails.