def scan(scanroot, ofilename):
ofile = open(ofilename, 'w')
with ProcessPoolExecutor() as pe:
futures = []
for root, dirs, files in os.walk(scanroot):
for f in files:
fname = os.path.join(root, f)
futures.append(pe.submit(scan_file, fname))
for f in futures:
try:
ofile.write(f.result())
except Exception as e:
print('ERROR:', str(e))
ofile.close()
Very simple and works fine. But when I did this on a USB 3 disk on Linux (Ubuntu Wily) something weird happened. If you do the evaluation with just one process, the disk transfers data at a rate of 70 MB/s, which is a fraction slower the speed of an internal hard disk. When running 8 simultaneous jobs, the total transfer rate is 4 MB/s which is almost 20 times slower.
I have no idea what could be causing this but it seems to be specific to USB, internal hard drives handle multiple readers effortlessly.
This is indeed due to the fact that most USB hardware controllers are not built for multi process read/writes but use a polling method which causes large delays when there is contention on the bus to a device (or devices). This shows up a lot in embedded hardware where someone tries to use a rasp pi with a disk and network and then finds that because both are on the same USB bus... The speed is 4MB vs 70.
ReplyDelete