According to Microsoft the local SSD performance is dependent on the machine type:
"The cache is subject to separate IOPS and throughput limits at the VM level, based on the VM size. DS-series VMs have roughly 4,000 IOPS and 33 MB/s throughput per core for cache and local SSD I/Os. GS-series VMs have a limit of 5,000 IOPS and 50 MB/s throughput per core for cache and local SSD I/Os."
https://docs.microsoft.com/en-us/azure/storage/storage-premium-storage
I did a couple of IOMeter tests to check sequential and random performance of the drive. I used 4 workers with 1 outstanding IO each on a 20 GiB test file. I tested both the L4 and L16 instance sizes.
L4, Sequential Read: Roughly 200 MiB per second (32 KiB, 100% Read, 0% Random,) Writing out the test file ran at just over 200 MB/sec.
L4, Random Read: gets roughly 20k IOPS
L4, Random Write: also gets roughly 20k IOPS
So it appears that the throughput is being throttled due to the very predictable performance figures being seen. Looks like the throttle for these VMs is set the same as the GS series VMs at 5,000 IOPS or 50MB/sec per core.
I have also performed the same tests on an L16 server which has 16 cores and 2.8 TiB of local SSD to see if they are throttled at the same level. I used the same tests as the L4 server with the exception of using 16 workers. Using only 4 workers on this VM resulted in significantly worse performance to 16 workers. Perhaps the throughput throttling is actually working at the core level?
In theory these results should be 800 MB/sec sequential reads or 80,000 random IOPS.
Writing the test file out was running around 620 MB per second (using 4 workers). Because of this, I've included a sequential write test with 16 workers since this is a bit worse than expected.
L16, Sequential Read
L16, Sequential Write
L16, 4k Random Read
L16, 4k Random Write
L16, 4k Random Write using only 4 workers
results in significantly worse performance, although it doesn't scale linearly with worker count. We are still seeing 45,000 IOPS with this worker configuration.
Charts
Conclusions
From the tests we can see that performance scales linearly with core count. An interesting note is that the worker count is very important to get full performance from the drive. This would translate to setting the correct thread count in SQL Server or similar.
The L-Series local SSD and cache appears to perform the same as the GS series VMs. It appears that Microsoft is delivering on the stated IO figures very well with low average latency of less than 1 ms.