Changed methodology for benchmarks
- Usage of pytest-benchmarks is dropped since it call tests sequentially which very long time to complete, instead original tavern tests are used
- Time measuring is done via durations switch to pytest and tavern
- During benchmarking validate_response.py is disabled using nevly introduced env variable
- Each run creates xml junit file with time data
- After all runs data from files are combined and report file is generated
- Tests above threshold are marked with red