Sunday, August 28, 2016

Weird USB disk slowdown with multiple readers

I have a bunch of files and hard drives scattered about (as I'm sure many of you do, too). I wanted to transfer all of them into one drive, so to start off I wrote a Python program to go through all files on a drive, calculate their SHA-256 and write that to a file. Here is the core of the code that does the evaluation in parallel.

def scan(scanroot, ofilename):
    ofile = open(ofilename, 'w')
    with ProcessPoolExecutor() as pe:
        futures = []
        for root, dirs, files in os.walk(scanroot):
            for f in files:
                fname = os.path.join(root, f)
                futures.append(pe.submit(scan_file, fname))
        for f in futures:
            try:
                ofile.write(f.result())
            except Exception as e:
                print('ERROR:', str(e))

    ofile.close()


Very simple and works fine. But when I did this on a USB 3 disk on Linux (Ubuntu Wily) something weird happened. If you do the evaluation with just one process, the disk transfers data at a rate of 70 MB/s, which is a fraction slower the speed of an internal hard disk. When running 8 simultaneous jobs, the total transfer rate is 4 MB/s which is almost 20 times slower.

I have no idea what could be causing this but it seems to be specific to USB, internal hard drives handle multiple readers effortlessly.