I have recently been running some benchmarks on Azure Virtual Machines... Lots of benchmarks!
Update: Script published on GitHub here! This is not a finished product and is a bit "hacky" - you have been warned! :)
In fact, I've written a script which will power up each VM type that is available to me on an MSDN subscription and make it run Cinebench R15. My MSDN subscription has the default cores per region limit and therefore the largest machine I was able to test was the D15_v2 at 20-Cores.
The script ran the benchmark 3 times to try to account for time based variance such as background processes. I tried to minimise background processes by disabling Windows Update and Defender on the machine.
The time taken to run through all of the machine types available to me serially and run Cinebench 3 times was not as long as I initially expected - about 24 hours for a full run. Due to this, I'm open to running other benchmarks that can be run from the command line and will output in some standard fashion. I might do a strawpoll if anyone is interested.
Keep in mind that Cinebench only tests CPU performance, so this will not be relevant for other machine uses such as GPU or Disk IO.
Price to Performance
Price to performance for each VM series and CPU type was calculated by dividing the Cinebench score (average of 3 runs) by the price per month of the Virtual Machine. The results are displayed as an average score of all the VM types in the series.
From the results, you can see that the F series VM is by far the best performing per pound spent. The defunct G series VM is the most expensive for CPU.
Real world Cinebench R15 results vs Microsoft 'Azure Compute Unit' figures
I have normalised the Cinebench per-core, per-thread and ACU scores for each VM type and then averaged the normalised value to each VM series.
The best performing per-core virtual machine type is the H series, using the Intel Xeon E5-2667 v3 Haswell at 3.2 GHz.
Looking at the results shows that almost all of the CPUs (relative to the H series) perform similarly to their respective ACU score which means that the ACU is a good benchmark to gauge relative CPU performance by.
The only outlier on this chart is the Dv3 series when looking at per-core values. Since this VM is using Hyper-Threading, there are 2 threads per core and therefore it performs significantly better relative to the other non-SMT virtual machine types. This is however reflected in the pricing of the VM and so the H series is still top dog on CPU price to performance.
A final note
I have looked at the variance on per core and per thread scores (divide multi core score by number or cores / threads,) and the variance on all the VM types in a series is very low. The only noticeable difference is the G series, where the Cinebench R15 multi core score does not scale linearly. Per core scores go down as the VM size increases. It's probably worth investigating single core benchmarks on these machines to see if there is some artificial limiting happening or if this is due to Intel Turbo Boost kicking in on the CPU.
Let me know if you think there might be a good benchmark to run on the entire series of Azure VMs?
Here is a Google Sheet with all of my results.
Here is the script
Dave.