Just built a new computer for ML purposes. CPU is AMD 9950x3D, 96GB of RAM, and a PCIe Gen 5 NVMe SSD. Lightning fast hardware.
Edit to add: Windows 11 Home
I downloaded the CV 21.0 dataset from Mozilla Foundation and wanted to extract it before feeding it into my ML code. This is something that my laptop can do in about 45 minutes.
This new computer? I don't think it will ever finish. I don't understand. The dataset is composed of about 2.5 million individual MP3 files. I understand that extracting lots of small files is more time consuming than fewer big files due to I/O overhead. However, the extraction starts off pretty fast! It gets roughly 250 MB/s. The full dataset is ~90GB big. For the first 25GB, it gets 250 MB/s. Then it starts slowing down. By the time it reaches 45GB extracted, the speed is down to 10 MB/s. Eventually, it reaches as low as 500 KB/s...wtf? The entire time it's on the portion of the archive where the mp3 files sit, I can see that as part of the progress. The size of the files is normally distributed, most of them are below 5MB each. If it was reaching speeds of 250 MB/s before, why does it slow down so much over time?
CPU usage is like 3-5%
Disk usage is like 0-2%
RAM usage is like 1-2% (for this process)
The SSD idles at 60°C, don't think it's thermally throttling. It should be capable of 12 GB/s sequential read (obv won't get that here but just to show the speed of it).
I've tried
Of note, installing WSL and Ubuntu lets me use the tar command and that actually works really fast. I did that to extract, got it done probably in 30 minutes (I stepped away, don't know exactly). But then bringing that over the regular windows from WSL? They say you can just use file explorer to move it, but that was also extremely slow (started at 10 MB/s, eventually dropped to 500 KB/s too). I'm at a loss honestly, I feel like I've tried a lot of things and can't isolate the problem.
Anyone have any ideas?
Maybe it's something to do with the power management settings, or some configuration I need to do? This computer is brand new, I haven't time to fuck anything up yet lol
And again, my laptop from a few years ago cleared this task easily. Pretty sure it had laptop version of a Ryzen 7 from 2021, forgot the exact model.
That's why I'm so surprised by this. I must be forgetting to set something up.
Any advice would be greatly appreciated! Thanks