Friday, April 01, 2005

Writing accounting records at time intervals

A major new feature of the exacct system is the ability to get an accounting record logged without terminating the process. There are two forms of this, for tasks you can get the record to dump the delta since the last record was logged. Somehow the task remembers the data each time it cuts a record so it can do the differencing. This seems to be too much overhead at the process level, so the other option is to cut a record that logs the same data as if the process had just exited, and this option is available for both tasks and processes.

The command that causes a record to be written is "wracct" and it takes a list of process or task id's and makes a system call to cause the record to be written. You have to be root to do this. The wracct command line syntax is a pain if you want to get it to dump multiple processes, as shown in this example from the manpage:

# /usr/sbin/wracct -i "`pgrep sendmail`" process

I want to make every process cut a record, and if you attempt to do this with wracct you need to form a list of every process id on the system. I tried to do this by listing all the entries in the /proc filesystem, but if any of the pid's exit before wracct gets to them it gets an error and quits. This is stupid, because if a process has exited, it has already cut an accounting record! The wracct command should have a "-a" option that writes all records and ignores errors.

I modified the exdump command to have a "-w" option that loops over all processes and forces an accounting record to be written before it reads the accounting file. If you aren't root, it has no effect. The code looks like this:

if (wflag) {
DIR *dirp;
struct dirent *dp;

dirp = opendir("/proc");
do {
if ((dp = readdir(dirp)) != NULL)
(void) wracct(P_PID, atoi(dp->d_name), EW_PARTIAL);
} while (dp != NULL);
(void) closedir(dirp);


The next step is a bit more complex. Both for flow accounting and interval process accounting, I need code that remembers previous records and matches them with subsequent ones so that they can be accumulated (flows) or differenced (process intervals).


  1. In the past have captured days worth of exacct interval records with the intent of analyzing them later. The key problem I had was the lack of a sample timestamp. As such, to get an idea of which interval I was observing, I had to look at the finish-sec of the most recently terminated process.

    Aside from my hack of a method or rotating the exacct file every time I write interval records, is there a good way on the horizon for getting timestamps of when an interval record was taken?

    And more to the point, I really wanted to observe the CPU usage attributed to each project. Is there a more efficient way to just get the cumulative number of seconds of cpu usr/sys time used by each project? My relatively light knowledge of dtrace suggests that I could do the math myself each time a context switch occurs, but that seems like it would be too much overhead. An example of a target machine is a 40 processor 15k domain with a mixture of Oracle RDBMS (long running processes) and Concurrent Manager (short running processes) jobs. Does the answer change at all if I ask for per-zone utilization rather than per-project?

  2. I intend to rotate the exacct logs and include the timestamp for the interval in the log filename itself.

    Dtrace is overkill for getting CPU usage of a project, microstate accounting measurements are made in the way you mention, by accumulating time on each context switch, and they are included in the accounting records, so by using process or task accounting and filtering on project id you can easily accumulate the CPU time. You can also filter on zone id very simply.

