In the latest joblib, joblib will trace the execution time of each job and start bunching them if they are very fast. Thus, if these individual computation items are very fast, this overhead will dominate the computation. It is important to keep in mind that dispatching an item of the for loop has an overhead (much bigger than iterating a for loop without parallel). In the latest joblib (still beta), Parallel can be used as a context manager to limit the number of time a pool is created, and thus the impact of this overhead. It's was especially costly here, as the code not protected by the " main" was run in each job at creation of the Parallel object. Parallel pool creation overhead: The problem here is that creating a parallel pool is costly. In addition to the above answer, and for future reference, there are two aspects to this question, and joblib's recent evolutions helps with both. Therefore your function will basically always return no cross given the random input. With timings: n_jobs=1: Finished in 5.33742594719 secĪ side node on your code, although I haven't really followed its purpose as this was unrelated to your question, contains_path will only return True if this path completely contains the given path. Then the code looks like this: import numpy as np Secondly the string res='no cross' is written is each loop turn, although it may only change once (followed by a break and return). Firstly for all elements in other_paths=a the line Path(.) is executed in every call. The saturation now slightly moved to n_jobs=4 which is the value to be expected.Ĭheck_paths does several redundant calculations that can easily be eliminated. Print "Finished in", time.time()-now, "sec" Res = Parallel(n_jobs=int(sys.argv)) (delayed(check_paths) (Path(points)) for points in b) This gives me the following code (with timing and main loop guard as the documentation of joblib recommends: import numpy as np ( Parallel will fork the process, copying all global variables to the newly spawned processes, so a is accessible). This means you copy a for each element in b. You should notice that the arguments for each single loop entry are copied to the process executing it. So I guess the execution time is actually limited by memory access rather than processor time. However although I have four cores the gain already saturates at three processes. So there is a gain in using multiple processes. On my i7 3770k (4 cores, 8 threads) I get the following results for different n_jobs: For-loop: Finished in 33.8521318436 sec In long, here are my timings with your code: The only problem I see is much data copying overhead, but your numbers seem unrealistic to be caused by that. If you are on Windows you should use a protector for your main loop: documentation of joblib.Parallel. In short: I cannot reproduce your problem. Res = Parallel(n_jobs=2) (delayed(check_paths) (Path(points), a) for points in b) ![]() ![]() # Check if one line segment contains another. Is this a poor example or use of joblib? Did I simply structure the code wrong? Then dispatch item_n to one cpu and item_n+1 to the other cpu, execute the function and then write the results back to a list (in order). My assumption was that running the loop in parallel would copy lists a and b to each processor. Below is an example of where parallelizing leads to longer runtimes but I don't understand why. I've just started using the Joblib module and I'm trying to understand how the Parallel function works.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |