During the WS Challenge we had a discussion about Web Service Composition benchmarks. I think this is also result of a recently published journal article on WSBen. WSBen is a web service composition benchmark that had been developed as a general composition benchmark for the AI community. The history of WSBen goes back to a Web Service Challenge on the EEE/ICEBE conference in 2005. I don't know further details and how it is related to the current Web Services Challenge but it seems that at least one participant from the Penn state University is related to this project or persons of this project. The test sets of WSBen are related to simplified WSDL documents and only support syntactic service discovery. The focus of WSBen is applying a network model in order to generate clusters of Web Services and is also able to visualize it with a GUI. So much about WSBen.
Let me make this clear from the beginning: The current Web Services Challenge Software (Generator + Client + Checker) is a Semantic Web Service Composition Benchmark. If we also consider that the source code is available and that we apply modern design patterns to make it open and extensible then it can be also called a framework. It is as easy as that. The test sets are purely random generated without any structure. Their difficulty range from test sets which can be used for testing your system and up to a difficulty which cannot be solved in minutes or hours with even the winning system. The test sets do not have any structure, an algorithm suitable for a specific test set maybe inadequate for another algorithm. The difference between a specific benchmark and our toolset is that we have not yet published a set of test sets which can be used to evaluate your system. We have not done this so far because we did not see any use in that. Now, with several fast composition systems at hand and two years of experience with these tools we could develop such benchmark.
The Web Services Challenge has over 5 years experience in measuring the performance of composition systems. We provide tools, test sets, rules, performance results and now also compositions systems. I have about 70 generated test sets from the last challenge and we will upload them shortly. Anybody who is still interested in a composition benchmark should download these test sets and document their results: test system, initialization time, response time, the result file and checking file. The evaluation should be done on a more or less simple desktop computer. The results will be collected and used to generate new test sets. On the other hand if we find several test sets where the algorithms produce different results then we are able to analyze them. The resulting package of test sets, already collected results and measurements will be the best benchmark available. Benchmarking is not the definition of an individual but a community effort.