You are currently browsing the category archive for the 'Uncategorized' category.
There is the gplex scanner generator (http://plas.fit.qut.edu.au/gplex/) for the C#-programming language. Due to the mono runtime environment (http://www.mono-project.com/) C# code can be run on Win32, Linux and Mac-OSX platforms. As our wc sample run below suggests gplex scanner generated code is slower than Java JIT-compiled code but faster than Java-interpreted code. The mono application requires much less memory.
| Tool/Queries | 1 | 10 | 100 | 1000 | 10000 |
|---|---|---|---|---|---|
| wc-flex | 0.005 | 0.011 | 0.087 | 0.912 | 10.426 |
| wc-java | 0.197 | 0.248 | 0.367 | 1.549 | 14.707 |
| wc-java016 | 0.216 | 0.263 | 0.346 | 1.533 | 14.806 |
| wc-java064 | 0.209 | 0.243 | 0.364 | 1.522 | 14.649 |
| wc-java512 | 0.216 | 0.247 | 0.351 | 1.542 | 16.486 |
| wc-javaip | 0.193 | 0.422 | 2.808 | 28.353 | 331.753 |
| wc-mono1.2 | 0.168 | 0.282 | 1.580 | 15.650 | 175.863 |
| wc-mono2.0 | 0.147 | 0.267 | 1.676 | 15.870 | 187.633 |
| wc-mono2.0-static | 0.150 | 0.272 | 1.670 | 16.265 | 189.411 |
| wc-re2c | 0.004 | 0.006 | 0.038 | 0.382 | 4.301 |
| wc-unix | 0.010 | 0.044 | 0.453 | 4.782 | 54.704 |
wc-mono-static means that a executable with the mono-runtime linked in was tested. Mono-frameworks were 1.2 and 2.0.1. Different options for java memory were tested. java016 means that java was run with the -Xmx16m option.
| Mode/Lines | 1 | 10 | 100 | 1000 | 10000 | 100000 | 250000 | Memory (M |
|---|---|---|---|---|---|---|---|---|
| flex | 0.001 | 0.001 | 0.001 | 0.004 | 0.015 | 0.182 | 0.313 | 2. |
| java15 | 0.132 | 0.135 | 0.132 | 0.242 | 0.623 | 1.273 | 2.130 | 79 |
| java15_x64 | 0.137 | 0.133 | 0.133 | 0.233 | 0.581 | 1.240 | 2.288 | |
| java16 | 0.131 | 0.134 | 0.131 | 0.205 | 0.441 | 1.194 | 1.963 | 800 |
| perl | 0.031 | 0.031 | 0.031 | 0.032 | 0.066 | 0.384 | 0.961 | 7 |
| plex2.0_32 | 0.001 | 0.001 | 0.002 | 0.011 | 0.109 | 1.011 | 2.765 | 0.8 |
| plex2.2_32 | 0.001 | 0.001 | 0.003 | 0.012 | 0.123 | 1.136 | 3.136 | 1 |
| plex2.2_64 | 0.002 | 0.001 | 0.003 | 0.016 | 0.155 | 1.479 | 3.808 | 1 |
| tcl | 0.090 | 0.089 | 0.092 | 0.123 | 0.439 | 3.392 | 8.724 | 6 |
Again the same set of blastfiles was used for testing of a word counting scanner. Flex and re2c based scanners again were performing best.
| Mode | 1 | 10 | 100 | 1000 | 10000 |
|---|---|---|---|---|---|
| wc-flex | 0.003 | 0.011 | 0.102 | 1.083 | 12.459 |
| wc-flexpp | 0.026 | 0.169 | 1.940 | 21.193 | 244.294 |
| wc-gcj-exe | 0.097 | 0.123 | 0.441 | 3.934 | 42.928 |
| wc-gcj | 0.087 | 0.307 | 2.875 | 30.163 | nd |
| wc-java14 | 0.153 | 0.259 | 0.481 | 1.748 | 15.965 |
| wc-java | 0.176 | 0.257 | 0.444 | 1.704 | 15.682 |
| wc-javaip14 | 0.122 | 0.345 | 2.774 | 28.982 | 329.265 |
| wc-javaip | 0.120 | 0.345 | 2.771 | 28.769 | 335.123 |
| wc-perl-hand | 0.006 | 0.018 | 0.155 | 1.590 | 18.132 |
| wc-perl-lex | 0.164 | 0.872 | 9.108 | 97.561 | nd |
| wc-plex64 | 0.008 | 0.044 | 0.476 | 5.106 | 58.264 |
| wc-plex | 0.006 | 0.043 | 0.433 | 4.589 | 55.773 |
| wc-re2c | 0.002 | 0.005 | 0.035 | 0.346 | 3.975 |
| wc-tcl8532 | 0.257 | 1.625 | 17.440 | 190.076 | nd |
| wc-tcl8564 | 0.183 | 1.071 | 11.550 | 126.854 | nd |
| wc-tcl | 0.401 | 2.327 | 25.212 | 274.777 | nd |
| wc-unix | 0.006 | 0.026 | 0.285 | 2.937 | 33.792 |
Sample: BlastFile with 1 to 10.000 result items
| Mode | 1 | 10 | 100 | 1000 | 10000 |
|---|---|---|---|---|---|
| flex | 0.003 | 0.015 | 0.199 | 1.979 | 25.548 |
| flex-tcl | 0.005 | 0.018 | 0.167 | 1.775 | 20.701 |
| gcj | 0.099 | 0.149 | 0.784 | 7.536 | 83.421 |
| gij | 0.109 | 0.482 | 4.917 | 51.939 | nd |
| java | 0.228 | 0.348 | 0.753 | 3.048 | 27.806 |
| javaip | 0.180 | 0.517 | 4.715 | 49.852 | nd |
| plex | 0.011 | 0.082 | 0.856 | 9.565 | 107.987 |
| perl | 0.031 | 0.050 | 0.235 | 2.280 | 23.916 |
| re2c | 0.004 | 0.012 | 0.076 | 0.765 | 8.438 |
| tcl | 1.702 | 12.524 | 140.249 | nd | nd |
The Re2c based scanner is the fastest, but the setup and the coding is more complicated than for the other scanners.
Flex-based scanners are 2-3 times slower than Re2c based scanners, regardless if there is an embedded Tcl-interpreter for better string handling (flex-tcl), Jflex code (java), executed with the Sun-Java Hotspot virtual machine (1.5) as well as to machine code compiled Jflex code (java-gcj) and Plex (sbs-plex = Pascal lex) based scanners are about 5 and 10 times slower than Re2c based scanners. Interpreted Java-Code either executed with the Sun-interpreter (java-ip = “java -Xint”) or with the gnu-interpreter (java-gij) is about 50 times slower than Re2c-Code. The Tcl based scanner is about 1000 times slower than the Re2c based. The per scanner is a line based scanner thereof not able to do complicated scanning with more than two states or patterns on the same line.
Some new documentation about the aim of the bioscanners project is now online.
Project Aim
Write parsers for biological data based on scanner generators like Flex (C), Re2c(C), Jflex (Java) and Ifickle (Tcl). These scanner generators are providing easier maintainance, development and higher speed than hand written scanners.
Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!
