Low disk performance after reching 75% of disk space

by mk6072107 . Updated 3 years, 10 months ago

We have a system, that needs to store around 1 bil of images ~120KB in size. Our workaround is two Ubuntu 20 VM in PCS cluster with iscsi NetAPP E-series storage. Storage is mapped inside VM directly via network with multipass. Disk size is 100TB, FS – EXT4. To receive these files we use nginx with webdav. Usual rate is ~45MB\s. VM config – 4CPU, 8RAM. The directory tree is like this – directory with name=date, then around 1500 subdirs, each of them cointain ~10k files

Everything is running great, until we reach some strange treshold. When we are getting around 74TB of space usage I’m starting to see rising number of read io. They are small – aroun 0,5MB\s with 30-100 iops. Then, when we reach ~77-78TB the performance drops drastically. Disk util is up to 100%, CPU iowait is 70-80%. In ATOP I see a process kworker/u16:0+flush, that eats 1 CPU at 100% and perf top is showing ext4_mb_good_group in top position.

Inodes count is 30% used, FS mounted without any options.

The problem is resolved after we clean some old files, but when they rise again – the problem shows itself.

Can anybody help me with it?

Low disk performance after reching 75% of disk space

All Comments