Skip to content

small-steps integration of multithreading #929

@ThomasWaldmann

Description

@ThomasWaldmann

there's the "one big step" multithreading branch and it is a pain to keep it updated with changes from master.

while thinking about the issues there (ordering, race conditions, crypto) the idea of "sequential threading" connected by queue.Queue came to mind (it intentionally does not use parallelism on same phase of processing, thus only 1 thread per stage):

finder -q- reader -q- id-hasher -q- compressor -q- encryptor -q- writer

finder: just discovers pathnames to back up (obeying includes, excludes, --one-file-system, etc.)

reader: reads and chunks a file

hasher: computes id-hash of a chunk so we can check whether we already have it

compressor: compresses a chunk

encryptor: encrypts a chunk

writer: writes stuff to the repo

A side effect of such a staged processing with workers approach is that the code gets untwisted, stages clearly separated and they communicate over well-defined data structures passed over the queues.

The full-blown implementation of this needs not to be done in one go, we can start with lesser stages, e.g.:

finder/reader -q- hasher/compressor/encryptor -q- writer

this can solve: cpu sitting more or less idle while waiting for I/O to complete (read/seek time, write/sync time), i/o sitting idle while waiting for cpu-bound stuff to complete.

this can not (and should not) solve: very slow compression algorithms needing same-stage parallelism.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Multithreading

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions