The OAI-PMH protocol defines several verbs which can be used in requests to an OAI-PMH data provider. For harvesting, the most obvious are:
- ListIdentifiers, which returns a list of identifier of records, in combination with
- GetRecord, which returns the record for the specified identifier
- ListRecords, which return a set of records
Some harvesting solutions choose to do a ListIdentifiers and for each identifier do a GetRecord. Some choose to harvest with the ListRecords verb. Although both methods lead to the same content (discarding the OAI envelope header and focusing on the record), the number of HTTP requests differ, obviously. But, does this have an impact on the performance?
The complete results as well as the graph are available in a Google Spreadsheet. For the connection type WebPageTest was uses to determine if the method was supported. The tests where carried out with 2 Perl scripts and a Bash file which, together with the resulting output, can be downloaded here.
- Clearly, you get a better performance (=higher number of records per minute in a harvest) when you use the ListRecords method. So lesser HTTP requests results in faster harvests (about 2.5 – 11 times faster!!!)
- The use of keep-alive and gzip varies per OAI-PMH data provider. In general: if a data provider supports keep-alive and/of gzip, you’d better use is, it improves performance! You mileage may vary per data provider, so test what’s the best solution.
- Although this test was conceived to show the difference in verb usage and connection type, it also shows that some data providers perform better than others. Room for improvement…
- For those who have inspected the used Perl scripts might wonder why the “Beeld en Geluid” and “Open Beelden” data providers receive other parameters. Well, it’s seems they do not follow the OAI-PMH version 2.0 standard by the letter. It’s stated that the metadataPrefix is required when doing a ListIdentifiers or ListRecords. But these two data providers do not work when you use the metadataPrefix and resumptionToken together…