Sunday, August 28, 2016

Weird USB disk slowdown with multiple readers

I have a bunch of files and hard drives scattered about (as I'm sure many of you do, too). I wanted to transfer all of them into one drive, so to start off I wrote a Python program to go through all files on a drive, calculate their SHA-256 and write that to a file. Here is the core of the code that does the evaluation in parallel.

def scan(scanroot, ofilename):
    ofile = open(ofilename, 'w')
    with ProcessPoolExecutor() as pe:
        futures = []
        for root, dirs, files in os.walk(scanroot):
            for f in files:
                fname = os.path.join(root, f)
                futures.append(pe.submit(scan_file, fname))
        for f in futures:
            try:
                ofile.write(f.result())
            except Exception as e:
                print('ERROR:', str(e))

    ofile.close()


Very simple and works fine. But when I did this on a USB 3 disk on Linux (Ubuntu Wily) something weird happened. If you do the evaluation with just one process, the disk transfers data at a rate of 70 MB/s, which is a fraction slower the speed of an internal hard disk. When running 8 simultaneous jobs, the total transfer rate is 4 MB/s which is almost 20 times slower.

I have no idea what could be causing this but it seems to be specific to USB, internal hard drives handle multiple readers effortlessly.

1 comment:

  1. This is indeed due to the fact that most USB hardware controllers are not built for multi process read/writes but use a polling method which causes large delays when there is contention on the bus to a device (or devices). This shows up a lot in embedded hardware where someone tries to use a rasp pi with a disk and network and then finds that because both are on the same USB bus... The speed is 4MB vs 70.

    ReplyDelete