Writing Workload Models

From Filebench
Revision as of 23:11, 31 May 2011 by Vass-vass (Talk | contribs)

Jump to: navigation, search

For complete definition of the language goto Workload Model Language

Basic Structure of a Flow Language (.f) file

FileBench is essentially an interpreter for the I/O workload flow modeling language, whose files have a “.f” suffix. There are four main parts to the typical workload model which set defaults, define files or filesets, define processes, threads and the flowop sequences that they execute, and finally workload usage information. You can also include comments by beginning a line with a pound sign (#).

Setting Defaults

FileBench's set command is used to set (and create the first time encountered) the value of user defined variables. The variable names must be preceded by a dollar sign. A typical set command is as follows:

set $dir=/tmp

which creates a new variable named $dir and sets its initial value to the string /tmp. The variables can be referenced by name later during definition of files, processes, threads and flowops. If referenced before being set, they default to zero.

Defining files and filesets

Next, the typical workload model will define files or filesets (or both) to be used as destinations for the I/O that FileBench will be generating when the model is run. Files are FileBench's abstraction of single files system files, while filesets are sets of files sharing a common root directory and overall name. Both have attributes such as filesize associated with them, but filesets have additional attributes governing how the set of files (and associated directory tree) are built. A typical file is defined by:

define file name=bigfile1, path=$dir, size=$filesize, prealloc, reuse

and fileset by:

define fileset name=datafiles, path=$dir, size=$filesize, filesizegamma=0, entries=$nfiles, dirwidth=1024, prealloc=100, reuse

The name and path attributes are required for both types, while filesets also must have at least the entries and dirwidth attributes. For other attributes, defaults are used which include don't pre-allocate, don't reuse, file size of zero, file size distribution gamma of 1500 and directory width distribution gamma of 1500.

Both files and filesets are referenced by I/O flowops using their name attributes. A file corresponds to a single filesystem file with a pathname that is the concatenation of the path and name attribute. A fileset corresponds to a tree of directories and files, whose root is a directory with a pathname that is the concatenation of the path and name attributes. The rest of the directories and the files in a fileset are given numeric names, starting with 00000001 at each level. At a minimum, the fileset's root directory will contain a subdirectory named 00000001 which, in turn, contains a file named 00000001. So, using example values from above, the full path name of the first file would be:

/tmp/datafiles/00000001/00000001

Defining processes, threads and flowops

FileBench follows the modern operating system practice of defining processes which consist of one or more threads executing instructions. In the case of FileBench, the filesystem processes and threads correspond to the workload language's processes and threads respectively, while the roll of instructions in the workload language are performed by flowops. The define process command is used, which specifies a set of process specific attributes and includes one or more thread definitions within a pair of braces. For example:

define process name=rand-read, instances=1
{
    thread ...
}

defines a process whose name is rand-read, which will cause one instance of a file system process to be generated. The example also contains a single thread definition, which would itself include a name, some attributes, and a list of brace enclosed flowops to execute, such as:

    thread name=rand-thread,memsize=5m,instances=$nthreads
    {
      flowop ...
      ...
    }

which would cause $nthreads number of operating system threads to be created in the process, each running the brace enclosed list of flowops, and each with a shared memory region of five megabytes. The created threads independently cycle through the list of flowops, repeating the list until the workload terminates. A typical flowop definition looks like this:

      flowop read name=rand-read1, filename=largefile1, iosize=$iosize, random

which specifies a read type flowop named rand-read1, which accesses the file largefile1 with read I/Os of $iosize length to random file offsets.

A workload can consist of several processes, each generating multiple OS process instances at runtime and each of which consisting of multiple thread definitions, which in turn may each generate multiple OS thread instances. Each thread definition will also include the list of flowops that all of its instances will be executing.

Usage strings

FileBench has an extensible help facility which consists of built in usage messages providing help with the flowop language syntax, and a usage command which adds workload specific help messages. An example usage command to specify how to modify the I/O size used is:

usage " set \$filesize=<size> defaults to $filesize"

A token beginning with “$” is interpreted as a variable name, and the a string representation of its value is substituted for the actual variable name. A backslash “\” can be used to suppress that interpretation. Thus, the above line will cause the flowing to be included with a help message:

set $filesize=<size> defaults to 1048576

assuming that $filesize had been previously set to 1m (one megabyte).

Putting it all together

All of the standard workload files supplied with FileBench follow a format of initial comment block, variable default setting block, file and fileset definition block, process definition block and finally usage string block. Grouping the the workload files into blocks in this as suggested here is not required by FileBench, but does meet the requirements that variables be set before being used by file, fileset, process, thread or flowop definitions, and that files and filesets be defined before being used by flowops, and is the preferred style.

Examples: Understanding Typical Workload files

This section will examine two workload files in detail, a relatively simple workload, randomread.f, and a more sophisticated workload, oltp.f.

randomread.f

This workload does reads with random offsets into a single, large file. It runs until stopped externally, usually by the main FileBench process when a specified number of seconds have elapsed. The randomread.f workload can be viewed here

In this example, the comment block at the beginning includes a CDDL license, Sun copyright and a version number. The next section creates variables and sets default values for them. This workload's defaults are:

  • $dir – a default directory for the created files, set to /tmp
  • $nthreads – the number of thread instances to create for the single thread definition, set to 1
  • $iosize – the number of bytes in each I/O, set to 8196.
  • $filesize – the size in bytes of the test file, set to 1048576
  • $workingset – the subregion of the file actually used. Setting to 0 as done here means use the whole file.
  • $directio – bypass the file system or not. Setting to 0 as done here means do not bypass.

Next comes the define file statement, which for randomread.f creates a single file. The attributes set are:

  • name – the name of the file, in this case largefile1
  • path – the pathname of the directory in which this largefile1 will be created.
  • size – the size of the file to be created, set to the latest value of $filesize at run time.
  • prealloc – directs FileBench to create and fill the file to the amount specified by the size attribute at the beginning of the run. If missing, a file will be created with size of zero.
  • reuse – directs FileBench to reuse the file if it already exists and is at least as large as specified by the filesize attribute. If too large it will be truncated to the correct size.

Once the variables have been set and the file defined, the next step is to define one or more processes. In this example a single process is defined and passed two attributes, its name (rand-read) and the number of instances to create (1). Each defined process also needs at least one thread, defined within the braces following the define process statement by thread statements. In this example, there is only one thread which has three attributes:

  • name – its name (rand-read1).
  • memsize – the size of its thread memory (5 Mbytes)
  • instances – the number of operating system threads to create ($nthreads).

The final step in the process definition section is to define the flowops that each thread will cyclically execute. These are enclosed in a set of braces following the thread definition. In the case of our rand-read1 thread, there are only two flowops, one to perform file reading, and one to set an upper limit on the number of times per second the flowops are executed.

  • flowop read – read from a file.
    • name – The name of the flowop (rand-read1)
    • filename – The file to read from (largefile1)
    • iosize – The size of each read I/O, set from var $iosize.
    • random – Select each file offset randomly, within the working set.
    • workingset – Use only the first $workingset bytes of the file. If workingset is set to a value of 0, as is $workingset defaults to in this example, then the working set becomes the entire file.
    • directio – If this attribute is used without a value, it enables direct IO. If it is set to a value, as here were it is set to the value of $directio, a value of 0, as $diretio defaults to in this example, disables it, while any other value enables it.
  • flowop eventlimit – Limit the execution rate of the flowop sequence in this thread to one pass per event. A separate command is used to establish the rate, which defaults to 0. If the rate is 0, the event subsystem is disabled, and eventlimit becomes a no-op.
    • name – The name of the flowop (rand-rate)

Since FileBench entities for each file, fileset, process, thread, and flowop are placed on global lists for each type of entity, it is advisable to use unique names for them. Those names are used to locate the flowops when necessary, so duplicate names will make one of the duplicates inaccessible. Currently there are only a few cases where individual flowops are referenced, but more such cases may appear in the future.

In the randomread.f example, the next statement is an echo command, which prints the quote enclosed string on the console. In this case, it informs the user that the workload model was successfully loaded, as any error encountered when parsing the earlier statements would have ended the loading of the workload file before reaching this point.

The workload file concludes with a set of usage statements, which are both printed out to the console as they are encountered, and saved in the help string. Here is where the workload's author can provide information to workload users about running the workloads. As in this example, current practice is to limit this to a list of user configurable parameters and their default settings, but the facility can certainly support inclusion of more information if desired.

For the randomread.f example, the set of usage statements produces the following help message:

Usage: set $dir=<dir>
       set $filesize=<size>   defaults to 1048576
       set $iosize=<value>    defaults to 8192
       set $nthreads=<value>  defaults to 1
       set $workingset=<value>  defaults to 0
       set $directio=<bool>   defaults to 0
       run runtime (e.g. run 60)

oltp.f

The randomread.f workload illustrates the basic structure of a workload model. The oltp.f workload follows the same basic structure, but adds filesets and multiple processes, and uses several additional flowop types, including those that support asynchronous I/O and inter process synchronization. This section will focus on items not already covered in randomread.f that are used by the oltp workload model.

The oltp.f workload defines two filesets: datafiles and logfile. These definitions illustrate several attributes not used by the randomread.f example, some of which are required because they are filesets:

  • filesizegamma – The gamma parameter to the gamma distribution used to determine random file sizes and subdirectory widths. It is set to 0, which specifies that all files will be allocated to exactly filesize number of bytes, and only one, top level, directory will be created.
  • entries – The number of files to be created by the datafiles and logfile filesets, whose value is taken from the $nfiles and the $nlogfiles variables respectively.
  • dirwidth – The mean value of the width of each subdirectory unless filesizegamma is set to 0, as in this case, which makes it the max width for each subdirectory.
  • prealloc – The attribute which determines whether files are pre-allocated or not, and for filesets can also be supplied with a number from 0 to 100 which determines the percentage of files which will be pre-allocated. In this example, 100% of the files will be pre-allocated.
  • cached – A boolean whose value is taken from the $cached variable and determines whether the file system caches will be flushed at the start of the run.

In the oltp.f workload, the variables $nfiles and $nlogfiles are initially set to 10 and 1 respectively, which means that by default the datafiles fileset will be created with 10 files, and the logfile fileset will be created with 1. The variable $cached is set to 0, meaning that by default the file system's cache will be flushed at the start of each run. These, and other, variables may be reset to other values before use, altering the configuration for a given run.

The oltp.f workload also defines three processes, and specifies multiple instances of two of them. The three processes are the log writer (lgwr), database write (dbwr) and database reader (shadow), which have respectively 1, $ndbwriters and $nshadows instances created at run time. The $ndbwriters variable is set to 10, and the $nshadows variable is set to 200 in the workload file, so the default is ten dbwr process instances and two hundred shadow process instances.

Each of the three process definitions contains a single thread, and a list of flowops whose execution generates its portion of the OLTP workload. The lists include several flowops not found in the randomread workload, as well as a new attribute to the read flowop which further illustrate the power of the workload modeling language. In particular, the dbwr process and thread illustrates the use of asynchronous writes with the aiowrite and aiowait flowops, cpu cycle consumption with the hog flowop, and semaphore operation with the semblock flowop, while the shadow process and thread illustrates cycling through the files in a fileset using the opennext attribute, and additional aspects of semaphore operation with the sempost flowop.

The dbwr thread has an aiowrite and aiowait flowop bracketing a hog and semblock flowop. The aiowrite flowop gets the same attributes as other read and write operations and will issue an asynchronous write to the filesystem and then continue to the next flowop, which is the hog flowop. The hog flowop preforms a byte write loop for value iterations, thus consuming a roughly fixed amount of cpu cycles per invocation. It is followed by the semblock flowop, which synchronizes the dbwr threads with the shadow threads, by blocking on its semaphore until sufficient posts from the sempost flowop in the shadow threads have occurred. When the thread's flowop execution continues, it will invoke the aiowait flowop, which pauses until one or more outstanding asynchronous writes complete. By using the aiowrite and aiowait operations, the workload model emulates the overlapping of I/O with cpu processing that occurs in typical oltp software. If the I/O completes quickly, the loop's progress will be limited by the hog and semblock flowops, while if the I/O is slow, the aiowait flowop will then limit the loop's progress.

While each shadow process has only one thread instance, there are 200 process instances in the default configuration, so 200 total threads. The first flowop in the shadow thread loop is a read, followed by a hog flowop, two sempost flowops and an eventlimit flowop. The hog and evenlimit flowops have been previously discussed, but the sempost and read flowops add some new attributes to those already defined:

  • read flowop
    • filesetname – The name of the fileset, datafiles in this case. Actually the filesetname and filename attributes can be used interchangeably.
    • opennext – When specified, causes each subsequent invocation to access a different file of the fileset. For the default number of files in this workload, the fileset will contain ten files, which will be accessed in rotation by each shadow thread.
  • sempost flowop
    • value – Amount to add to the semaphore count with each post, in this workload, both are set to one.
    • target – Name of the semblock flowop whose semaphore this post will act on. One flowop targets the semblock flowop in lgwr, named lg-block, the other targets the semblock flowop in dbwr, named dbwr-block.
    • blocking – When specified, indicates that the thread containing this sempost flowop must block if it gets too far ahead of the thread containing the target semblock flowop.

The maximum execution rate of the shadow thread's flowop list is limited by the eventlimit flowop, similarly to that of the randomread workload. However, it should be noted that there is only one source of events, so each of the 200 shadow process' threads share that one event generator, and on average will loop for 1/200 of the specified rate, which, of course, still results in a total number of loops per second equal to the event generator rate. Also, the read, hog and sempost flowops may further limit the execution rate. For instance, if the combination of read flowop access delay and hog flowop cpu delay exceeded 200 times the event period, they would become the limiting factor. As will be described next, the sempost flowops can also limit the execution rate of the shadow threads.

In the oltp.f workload, counting semaphores are used to limit the rate of execution of the two writing processes to specific fractions of the rate of the reading process. This is done through appropriate settings of values that are added and subtracted from the semaphores with each operation. Both sempost flowops in the shadow thread will add one to their respective counting semaphores on each invocation, meaning that a total equal to the event generator's rate will be added each second. More interesting is the amount that is subtracted from the semaphores by each invocation of semblock in the other two threads. The semblock in the lgwr thread will block unless the semaphore count is at least 3200, at which point it subtracts 3200 and continues. Similarly the semblock in the dbwr thread will block unless the semaphore count is at least 1000, at which point it will continue, but subtract 1000. The overall effect is that lgwr does one pass through its flowop list for every 3200 passes by shadow, and the dbwr does one pass for every 1000 passes by shadow.

While normally semaphores are thought of as a mechanism to keep a trailing process from overtaking the leading one (for example, a consumer overtaking a producer), the FileBench semaphore flowops also prevent the leading process from getting too far ahead of the trailing process. This is done internally by creating a second operating system semaphore whose count is initialized to the highwater value and whose post and block operations are swapped, so that sempost actually does a block operation on the second semaphore, and whose semblock actually does a post operation. Thus, the sempost flowop will block the leading process if the semaphore's count becomes smaller than the sempost value attribute and the semblock flowop will post its value to the count each time it executes. Since both sempost flowops are configured with values of one, the highwater setting of 1000 for lgwr will allow shadow to get 1000 flowop loop iterations ahead of it before blocking, and the highwater setting of 2000 for dbwr will allow shadow to get 2000 flowop loop iterations ahead of it before blocking. Thus, with one set of flowops, the workload language is able to model the normal case that producers can't really get too far ahead of consumers, due to other dependencies or resource constraints.

The two workload models detailed here are typical of those developed for many other workloads. There are other flowops, attributes and commands supported by FileBench that are not included in these two examples, so for a complete specification of the workload modeling language, see the Filebench Workload Language page. Follow the general approach illustrated by the above example workloads, add and adjust the flowops to suite your application, and you too can create a FileBench model of your favorite workload.