CorePerf Ltd Low-level software and optimisation
Services Projects Blog
View Rejit on GitHub

Rejit

Building and usage

All scripts and utilities should include a help message available via $ <script> --help.

First build rejit.

$ cd <rejit>
$ scons

Then include <rejit>/include/rejit.h in your program.

#include <rejit.h>
using namespace std;
using namespace rejit;

int main() {
  string text = "";
  vector<Match> matches;
  rejit::MatchAll("regexp", text, &matches);
  for (vector<Match>::iterator it = matches.begin(); it < matches.end(); it++) {
    // Do something.
  }
  return 0;
}

Finally, compile and link your program with the rejit library

$ g++ -o myprg myprg.cc -I<rejit>/include -L<rejit>/build/latest -lrejit

Documentation for the various functions offered by rejit are available as comments in include/rejit.h. You can also find examples in the sample programs provided in sample/. Compilation options are detailed in the help message from scons --help.

Sample programs

A few sample programs using rejit are included in the sample/ folder. It includes the regexdna and jrep samples. Compile them with:

$ scons sample/basic
$ scons sample/jrep
$ scons sample/regexdna
$ scons sample/regexdna-multithread

Use $ sample/<sample> --help for details.

Running the benchmarks

Rejit benchmark suite

You can run the rejit benchmark suite with

$ <rejit>/tools/benchmarks/run.py
$ <browser> <rejit>/tools/benchmarks/benchmarks_results.html

As usual the --help switch will list the options available.

Benchmark engines can be built separately.

$ scons pcre_engine
$ scons re2_engine
$ scons rejit_engine
$ scons v8_engine

The build script will take care of cloning the appropriate repositories and building everything. Files are located in tools/benchmarks/engines/<engine>.

Note that compilation of re2 currently fails on OSX10.9 (at least). Here is a simple fix.

Grep benchmarks

Grep benchmarks are currently stil run manually. The rejit-powered grep utility is part of the sample programs.

You can benchmark with commands such as

$ CMD='jrep -R regexp linux-3.10.6/'; $CMD > /dev/null && time $CMD > /dev/null

Be very careful with shell special characters escaping. Also beware difference of syntaxes between jrep and grep (using respectively the Extended and Basic Regular Expression syntaxes), or use an option switch (-E with GNU grep) to use ERE.

DNA matching benchmarks

This benchmark is detailed on its original site. It is currently run manually. The rejit-powered implementations are part of the sample programs. They are simply run with

$ ./sample/regexdna < input.file

See the --help option for information on how to generate the input files.

Syntax

Rejit currently follows the POSIX Extended Regular Expression syntax(Wikipedia link). Reintroducing the Basic Regular Expression syntax is part of the future tasks.

Rejit supports the following elements of the ERE:

Testing

Rejit includes a test suite that can be run with tools/tests/run.py.

Compilation Information

The compinfo tool can be used to show some information about the compilation of a regexp.

$ scons compinfo
$ ./tools/analysis/compinfo --help

For example:

$ ./tools/analysis/compinfo --print_ff_elements=1 "[0-9]abcd[0-9]" Fast forward elements ----------------------{{{ MultipleChar [abcd] {1, 2} }}}--------------- End of fast forward elements

Disassembly of the generated code

compinfo can also be used to dump the generated code so it can be disassembled.

$ scons compinfo
$ ./tools/analysis/compinfo --dump_code=1 regexp

Then use ndisasm to disassemble the blob.

$ ndisasm -b 64 dump.1 > disasm.regexp
$ cat disasm.regexp
[...]
0000009D  66490F6F0E        movdqa xmm1,[r14]
000000A2  660F3A63C10C      pcmpistri xmm0,xmm1,0xc
000000A8  0F821C000000      jc qword 0xca
000000AE  66490F6F5610      movdqa xmm2,[r14+0x10]
000000B4  660F3A63C20C      pcmpistri xmm0,xmm2,0xc
000000BA  0F8206000000      jc qword 0xc6
000000C0  4983C620          add r14,byte +0x20
000000C4  EBCE              jmp short 0x94
[...]