Filebench for Programmers
Contents
Introduction
Filebench is a program for benchmarking file systems. It is extremely flexible, using a powerful workload model language to allow creation of a wide variety of benchmarks which accurately model the IO behavior of real applications, without having to actually run those applications. This document describes the structure and operation of the source code which implements the core functionality of filebench. In addition to the executable binary named go_filebench, filebench also has an executable script, called filebench, which uses workload profiles to set parameters in the workload models and provide even more flexibility. At present this document mainly describes go_filebench.
To better illustrate the various aspects of go_filebench, a running example based on the oltp.f workload model will be employed. This is a fairly rich model, consisting of two filesets, three defined processes with multiple instances of two of them, and a number of basic flowops. The oltp.f workload file is shown in Figure 1.
The workload modeled by oltp.f is that of an online transactions processing system, and consists of a logfile writer process lgwr, ten data base writer processes, dbwr, and two hundred reader processes, shadow. The log writer writes to a logfile defined by fileset logfile, (an example of a fileset with only one entry), and the data base reader and writer processes access ten files defined by fileset datafiles.
Each of oltp.f workload's processes consists of one thread, whose attributes are contained in a threadflow entity. Each process's thread also defines a list of flowops which do the actual work of generating I/O representative of that of a real database system. The log writer thread will execute an asynchronous write (aiowrite flowop), wait for its completion (aiowait flowop), then block on semaphore lg-block (semblock flowop). Each of the database writer threads will execute an asynchronous write, perform some fake cpu processing (hog flowop), block on semaphore dbwr-block, then wait for the asynchronous write to complete. Finally, each of the database reader threads will execute a read (read flowop), perform some fake cpu processing, post to both the lg-block and dbwr-block semaphores (sempost flowop), then pause if their rate of execution is too high (eventlimit flowop). These three sequences are repeated by their respective threads until the benchmark run is completed.
The basic organization of go_filebench and its source code will first be described, followed by implementation details of the key internal data structures and a description of how they are interconnected and initialized when running the oltp.f workload model.
Filebench Organization
Benchmarking sessions are usually run from the filebench script, which takes a profile file, such as filemacro.prof, combines it with one or more workload model files, such as oltp.f, and sends the resulting combined model program or programs to go_filebench for execution. Figure 2 shows a fragment of a hypothetical profile named oltp.prof which specifies multiple runs of the workload model oltp.f measuring the performance with different points in the configuration space. The fragment illustrates two runs, each with 200 shadow processes, ten database writer processes and a file size of one gigabyte, but with the first run using 2 KB IOs, and the second using 8 KB IOs.
To run the filebench with this profile, the user would type in filebench oltp, which would run the filebench script, directing it to read oltp.prof, which would in turn cause multiple executions of the oltp.f model using settings from oltp.prof. Figure 3 illustrates the process, and shows a fragment of the intermediate, combined, model generated by filebench, which is used as the input to go_filebench.
The intermediate file is actually a workload file in its own right, however one of its first commands is the load command, which brings in the specified workload model file, in this case oltp.f. Next it sets values on a number of variables using values from the .prof file, for example the set $iosize=2k command based on the iosize=2k entry in oltp.prof. Finally it creates files, filesets, and processes using the create files, create filesets and create processes commands. Creating processes has the side effect of starting the actual run. The intermediate file then sleeps for the runtime specified in the .prof file, after which it does a stats snap command and dump command to capture and record run results. An alternate method to start would be to use the run command along with a runtime, for example run 60, which creates files, filesets and processes along with starting the run, instead of the create and sleep commands, but using separate creates allows some pre-workload run actions to be inserted between file / fileset creation and the actual workload run.
|
As is indicated by Figure 3, the benchmark's IO stream and results capture is produced by go_filebench, under control of the workload specifications in oltp.f and the filebench generated intermediate file. The go_filebench program is essentially an interpreter for the .f model language, repeatedly executing the flowops in the model to produce I/O of a size, rate and distribution that emulates that of a real workload.
Key Filebench Data Structures
The workload model is parsed by go_filebench during its initialization (or as supplied by the user in interactive mode), with the extracted information placed in internal data structures. The workload language defines five main entities: fileobj, fileset, procflow, threadflow and flowops which represent respectively: files, sets of files, operating system processes, operating system threads, and workload actions. These are internally instantiated by the following C language structs: fileobj_t, fileset_t, procflow_t, threadflow_t, and flowop_t. The fileobj, fileset, procflow and flowop entity instances are placed on linked lists rooted in a globally shared structure called filebench_shm, while the threadflow entities are linked off of their containing procflows. Figure 4 illustrates this high level structure, which is used to manage and locate individual instances of the entities.
|
Fileset Stucture
A workload model will include some files to operate on, declared with the define file statement for individual files, or the define fileset statement for sets of files. In the case of oltp.f, two filesets: logfile and datafiles are defined. The logfile fileset's entries attribute is set to 1, so only one file will actually be created, while the datafiles fileset's entries attribute is set to 10, and the dirwidth attribute is set to 1024, so a single directory with ten files will be created. If more files are requested than can fit in the specified directory width, a multilevel tree of subdirectories will be created, with the specified number of files spread out among the bottom level directories. The required depth of the directory tree is computed from the number of files specified by the entries attribute and the average directory width specified by dirwidth.
Before the actual subdirectories and files are created, a tree of filesetentry enties is created, with one entity for each future subdirectory and file by the fileset_populate() routine. Figure 5 shows the tree created for the datafiles fileset used by the example workload. Only filesetenty entities for the datafiles fileset are shown, with the single subdirectory entry with the FSE_DIR (abbreviated as DIR) flag set. It and all other filesetentry entities point back to the original fileset. The subdirectory filesetentry, in turn, is pointed to by the fileset's fs_dirlist field. If additional subdirectories are required, the fse_dirnext field would point to the first filesetentry in a list of second level subdirectories, and their fse_parent fields would all point back to the root filesetentry. Note that the bottom level directory filesetentry entities do not point to their associated files' filesetentry entities. All filesetentry entities defining files are linked together on a list rooted at the fileset's fs_filelist field, and use the fse_filenext field of each filesetentry (that specifies a file) to link to the next in the list.
|
Processes and Threads
The filebench workload language parsing phase also includes defining processes and threads to execute the specified flowops and thus generate the desired file I/O. The example workload file includes definitions for three process sets and their associated threads. As can be seen from Figure 1, each process definition includes a thread definition each of which in turn includes several flowop definitions. Note that more than one thread definition could be included within the outer braces of a given process definition. Each entity definition also includes a list of attribute modifiers, which override any default settings.
Figure 6 shows the initial procflow and threadflow entities created by the oltp.f workload during the parsing phase. The procflow entities are created in response to define process statements in the workload file while the threadflows are created in response to thread statements. The define process statements contain values for process specific attributes and one or more thread statements enclosed in braces. The threadflow entities associated with a given procflow will be placed on a thread list for that procflow. As noted in Figure 6, these are all FLOW_MASTER instance procflows and threadflows, which contain the master copies of the attributes, though they don't actually participate in model execution.
|
One of the key attributes of a FLOW_MASTER procflow or threadflow is the number of instances of worker procflows or threadflows that will be created to actually generate the workload. As indicated in Figure 6, the shadow procflow will create 200 worker instances of itself, the dbwr 10 worker instances, and the lgwr 1 instance. Each of these worker procflows will only have a single thread, as specified in the three FLOW_MASTER threadflows.
The worker procflows and threadflows, and their associated processes and threads, are created at the start of model execution. Figure 7 illustrates the relationships between FLOW_MASTER procflows and threadflows and their worker instances. All procflow entities are placed on a list rooted at the proclist pointer in filebench_shm. Since newly created procflows are placed on the head of the list, the FLOW_MASTER versions will end up at the tail of the list. Each procflow points to a list of its threadflows, though for this example, the lists only have one threadflow each. Each threadflow has a backpointer to its associated procflow and each procflow has the pointer to a list of its associated threadflows. Since this list pointer is inherited from the master procflow and each newly created threadflow is placed at the head of the list, the last item in the list will be the master threadflow for the master procflow. Also illustrated is the fact that each worker procflow is added to the global procflow list, again at the head, so more recently created procflows will be towards the head of the list.
|
Flowops
|
The workload model's behavior is governed by its sets of flow operations, termed flowops. Each flowop specifies an action to be performed by the workload model, either to generate I/O, such as the read flowop, or control the models behavior and timing, such as the semblock flowop. The flowop's data structure is shown in Figure 8, and includes a name an instance number and several list pointers including those for a master list of flowops and a list of flowops associated with the particular thread that is executing it. The flowop data structure also has pointers to the routines that implement it, a set of attributes and a set of statistics variables.
Each thread definition is followed by a set of flowop definitions enclosed in braces. The flowops form a program loop which is repeatedly executed while the model is running. Each flowop definition must specify a type from the list of implemented flowop behaviors and a unique name. Each flowop definition will also require a set of attributes specific to that flowop type. As an example, in this fragment from the oltp.f workload of Figure 1, the shadow thread has the following flowop definition:
where read is the flowop's type (file read operation), shadowread is the flowop's name, datafiles is the name of a fileset that will be read from, and the rest of the definition specifies various parameters to the read flowop.
There are three sets of flowops: instance 0 flowops, FLOW_MASTER flowops, and worker flowops. At initialization time the instance 0 flowops are created, one for each implemented flowop type. The instance 0 flowops are named for their particular type, such as read, and given a pointer to a function to be called at run time which implements their behavior. As other flowops are allocated later during parsing and execution, they inherit this pointer and some other key attributes form these instance 0 flowops. Each thread defined for a process has a list of flowops that it will be executing in order to generate its portion of the workload. These derivative flowops are placed on the master flowop list ahead of the initially created, base flowops, and also on a list for the individual threadflows they belong to. All of these entities and flowops are set to instance FLOW_MASTER (abbreviated as FM in the figures), and are initialized with the attributes specified in the workload file as part of their definition. They serve as master copies for the flowops that will eventually be created to do the actual work, and provide the inherited attribute values for those worker flowops. Figure 9 illustrates the FLOW_MASTER flowops used by the shadow thread and the instance 0 flowops from which they are derived.
FLOW_MASTER and instance 0 flowops
|
Program Flow During Workload Execution
After the workload model has been loaded, go_filebench's state will consist of zero or more fileobj entities representing files, zero or more fileset entities representing sets of files, one or more FLOW_MASTER procflow entities representing processes, one or more FLOW_MASTER threadflow entities representing threads, and one or more FLOW_MASTER flowop entities. Before go_filebench can begin generating the workload represented by the model, actual files, processes, threads and worker flowops must be created. This can be done by supplying the create files, create filesets, and create processes commands to go_filebench, which call respectively the parser_file_create(), parser_fileset_create(), and parser_proc_create() routines, which are in parser_gram.y. A run can also be started by just supplying the run command, which calls the same three routines and then sleeps for a specified amount of time while the workload executes.
A call to parser_file_create() creates the files associated with any fileobj entities, a call to parser_fileset_create() populates any filesets with filesetentry entities and creates their associated files and subdirectories, and a call to parser_proc_create() creates worker procflows, threadflows and flowops, along with the associated worker processes and threads. A more detailed description of the operation of the parser_fileset_create() and parser_proc_create() calls will now be given.
Fileset Creation
When go_filebench encounters a create fileset command in an input stream, it calls the parser_fileset_create() routine to populate and instantiated the files and subdirectories that make up the fileset. If the prealloc attribute is set to a value greater than 0, then fileset_populate() will be called. The expected directory tree depth is calculated from the mean directory width (dirwidth) and number of files (entries) attributes and fileset_populate_subdir() is recursively called to build a tree of filesetentry entities corresponding to the eventual directory tree. At each level it will create a multiple subdirectories with an average number in the directory based on a gamma distribution with a mean given by dirwidth. The fileset routines are fount in fileset.c
When the bottom (leaf) level of the tree is reached fileset_populate_file is called instead of fileset_populate_subdir() to create an appropriately sized file. The total number of files in each leaf subdirectory will also be based on a gamma distribution with mean of dirwidth. The size of those files will be based on a gamma distribution with a mean given by the size attribute.
Once the tree of filesetentry entities is created, the fileset_create() routine is called, to actually create directories and files specified by the filesetentry tree. If the reuse attribute is set for the fileset and the subdirectories and files already exist, the files will be adjusted back to their fse_size length, and the whole structure reused. Otherwise any existing subdirectories and files will be deleted, then a new directory tree will be created, and a fraction of the files, as specified by the prealloc attribute will be created and filled to their length, as specified by fse_size.
Process and Thread Creation
Calling parser_proc_create() results in a call to procflow_init(), which starts at the current head of the proclist, which only contains FLOW_MASTER instances at this point, and begins creating the specified number of instances of each procflow and its associated operating system process. In our example it will first create the 200 procflow instances associated with the shadow procflow. These are inserted at the head of the proclist, and hence will not be further examined by procflow_init(). Next will come master dbwr procflow for which ten instances will be created. Finally, the single lgwr working instance will be created. After each procflow entity is created and initialized through a call to procflow_define_common(), its associated process is created with a call to procflow_createproc() which forks a new process then execs a new instance of filebench, passing it the procflow name and instance number and a pointer to the shared memory region. The resulting set of procflow entities and their associated processes for the shadow procflow is depicted in Figure 10 and the code is in procflow.c.
|
Each new instance of filebench begins execution at main(), but unlike the original invocation of filebench, the fact that a procflow name was passed as an argument causes main() to call procflow_exec(), which continues the startup procedure for the procflow. The newly executing process first finds its associated procflow instance, then stores its process ID in a field of the procflow entity and calls threadflow_init() to create and start all the worker threads for the process.
As indicated in Figure 6, each master procflow has an associated master threadflow for each thread defined as part of the process. Each worker process will inherit a pointer to the master procflow's threadflow list, which at this point only has the master threadflow(s) on it. Each worker process' call to threadflow_init() will examine the list, creating tf_instances number of threadflow entities for each master threadflow. After each worker threadflow entity is created, a call to threadflow_createthread() creates the operating system thread associated with the threadflow. The threadflow routines are in threadflow.c
In our example, each of the three defined processes has one defined thread with the default number of thread instances, namely one. Thus each worker process will have a single worker threadflow and associated operating system thread. Figure 11 illustrates the resulting set of procflow and threadflow entities for the shadow procflow. Each threadflow has a backpointer to its associated procflow and each procflow has the pointer to a list of its associated threadflows.
|
At the start of a workload run, after the worker processes and threads have been created, each thread's worker flowops will be allocated. Each thread created by threadflow_createthread() starts executing the flowop_start() routine, which continues configuring the threadflow for its role in workload generation. The worker procflows inherit their master threadflow's tf_ops pointer, which points to the list of master flowops that was created when the master threadflow was defined. For each flowop on the list, the flowop_start() routine calls flowop_define_common() to allocate a new flowop which inherits its name, attributes and methods from the master threadflow's flowop. The first iteration of the loop will also clear the worker threadflow's tf_ops pointer, severing the connection with the master threadflow's list. The thread still has a copy of the old pointer though, which it uses to follow through the master threadflow's list of flowops, while the newly created flowops are placed on the worker threadflow's list. Figure 12 illustrates the linkage and flowop inheritance of the worker threadflow and its associated master threadflow from one of the shadow procflow's instances, while the common flowop code can be found in flowop.c
|
Load Generation
Once the worker thread's list of threadflows is created, the thread goes into a continuous loop executing its flowop list in order of flowop definition. When it reaches the end of the list, it starts over again from the beginning. This continues until the benchmark run concludes, at which point the loop exits and the thread terminates. Meanwhile, the process' original thread will be waiting for all children threads to terminate, and when they do, it will set its running pf_running field to zero, then terminate the process.