You are currently browsing the category archive for the 'Uncategorized' category.

There is the gplex scanner generator (http://plas.fit.qut.edu.au/gplex/) for the C#-programming language. Due to the mono runtime environment (http://www.mono-project.com/) C# code can be run on Win32, Linux and Mac-OSX platforms. As our wc sample run below suggests gplex scanner generated code is slower than Java JIT-compiled code but faster than Java-interpreted code. The mono application requires much less memory.

Tool/Queries 1 10 100 1000 10000
wc-flex 0.005 0.011 0.087 0.912 10.426
wc-java 0.197 0.248 0.367 1.549 14.707
wc-java016 0.216 0.263 0.346 1.533 14.806
wc-java064 0.209 0.243 0.364 1.522 14.649
wc-java512 0.216 0.247 0.351 1.542 16.486
wc-javaip 0.193 0.422 2.808 28.353 331.753
wc-mono1.2 0.168 0.282 1.580 15.650 175.863
wc-mono2.0 0.147 0.267 1.676 15.870 187.633
wc-mono2.0-static 0.150 0.272 1.670 16.265 189.411
wc-re2c 0.004 0.006 0.038 0.382 4.301
wc-unix 0.010 0.044 0.453 4.782 54.704

wc-mono-static means that a executable with the mono-runtime linked in was tested. Mono-frameworks were 1.2 and 2.0.1. Different options for java memory were tested. java016 means that java was run with the -Xmx16m option.

Mode/Lines 1 10 100 1000 10000 100000 250000 Memory (M
flex 0.001 0.001 0.001 0.004 0.015 0.182 0.313 2.
java15 0.132 0.135 0.132 0.242 0.623 1.273 2.130 79
java15_x64 0.137 0.133 0.133 0.233 0.581 1.240 2.288
java16 0.131 0.134 0.131 0.205 0.441 1.194 1.963 800
perl 0.031 0.031 0.031 0.032 0.066 0.384 0.961 7
plex2.0_32 0.001 0.001 0.002 0.011 0.109 1.011 2.765 0.8
plex2.2_32 0.001 0.001 0.003 0.012 0.123 1.136 3.136 1
plex2.2_64 0.002 0.001 0.003 0.016 0.155 1.479 3.808 1
tcl 0.090 0.089 0.092 0.123 0.439 3.392 8.724 6

Again the same set of blastfiles was used for testing of a word counting scanner. Flex and re2c based scanners again were performing best.

Mode 1 10 100 1000 10000
wc-flex 0.003 0.011 0.102 1.083 12.459
wc-flexpp 0.026 0.169 1.940 21.193 244.294
wc-gcj-exe 0.097 0.123 0.441 3.934 42.928
wc-gcj 0.087 0.307 2.875 30.163 nd
wc-java14 0.153 0.259 0.481 1.748 15.965
wc-java 0.176 0.257 0.444 1.704 15.682
wc-javaip14 0.122 0.345 2.774 28.982 329.265
wc-javaip 0.120 0.345 2.771 28.769 335.123
wc-perl-hand 0.006 0.018 0.155 1.590 18.132
wc-perl-lex 0.164 0.872 9.108 97.561 nd
wc-plex64 0.008 0.044 0.476 5.106 58.264
wc-plex 0.006 0.043 0.433 4.589 55.773
wc-re2c 0.002 0.005 0.035 0.346 3.975
wc-tcl8532 0.257 1.625 17.440 190.076 nd
wc-tcl8564 0.183 1.071 11.550 126.854 nd
wc-tcl 0.401 2.327 25.212 274.777 nd
wc-unix 0.006 0.026 0.285 2.937 33.792

Sample: BlastFile with 1 to 10.000 result items

Mode 1 10 100 1000 10000
flex 0.003 0.015 0.199 1.979 25.548
flex-tcl 0.005 0.018 0.167 1.775 20.701
gcj 0.099 0.149 0.784 7.536 83.421
gij 0.109 0.482 4.917 51.939 nd
java 0.228 0.348 0.753 3.048 27.806
javaip 0.180 0.517 4.715 49.852 nd
plex 0.011 0.082 0.856 9.565 107.987
perl 0.031 0.050 0.235 2.280 23.916
re2c 0.004 0.012 0.076 0.765 8.438
tcl 1.702 12.524 140.249 nd nd

The Re2c based scanner is the fastest, but the setup and the coding is more complicated than for the other scanners.
Flex-based scanners are 2-3 times slower than Re2c based scanners, regardless if there is an embedded Tcl-interpreter for better string handling (flex-tcl), Jflex code (java), executed with the Sun-Java Hotspot virtual machine (1.5) as well as to machine code compiled Jflex code (java-gcj) and Plex (sbs-plex = Pascal lex) based scanners are about 5 and 10 times slower than Re2c based scanners. Interpreted Java-Code either executed with the Sun-interpreter (java-ip = “java -Xint”) or with the gnu-interpreter (java-gij) is about 50 times slower than Re2c-Code. The Tcl based scanner is about 1000 times slower than the Re2c based. The per scanner is a line based scanner thereof not able to do complicated scanning with more than two states or patterns on the same line.

Some new documentation about the aim of the bioscanners project is now online.

Project Aim 

Write parsers for biological data based on scanner generators like Flex (C), Re2c(C), Jflex (Java) and Ifickle (Tcl). These scanner generators are providing easier maintainance, development and higher speed than hand written scanners.

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!