[mcstas-users] MPI questions/bugs
Peter Kjær Willendrup
pkwi at fysik.dtu.dk
Wed Oct 10 11:06:50 CEST 2012
Hi Jean-Francois,
On Oct 9, 2012, at 13:54 , Jean-Francois.Moulin at hzg.de<mailto:Jean-Francois.Moulin at hzg.de> wrote:
Here are some questions/bug reports which I put together as they might be related to each other
1) I am running into some trouble with longish (1e8 and above) simulations: the trace gives an ETA which is coherent (i.e it scales with the number of trajectories). I see that the progress bar follows the expected timing and then stays stuck forever at 90 % or so.
I am using MPI with 4 cores, and they are forever kept fully busy.
This is a nasty problem it seems: does not always appear for the exact same setup.... 1.5e8 traj. seems to be a diffuse boundary for triggering the effect on my machine.
I am afraid we have indeed seen this kind of behaviour before, thought I had ironed that one (in McStas 1.12c) out by making our ncount neutron counter a long long int instead of a double…
Could you try to send me the instrumentfile and the needed parms to try to reproduce this?
2) Moreover -small bug- it seems the progress bar has a conversion min <-> hours problem
(tested here with 1 core, but behaviour is similar with 4)
1e7 ETA 31 [s]
1e8 ETA 5.41667 [min] = ca 310 sec, OK
1e9 ETA 54.581 [min] OK
1e10 ETA 32457 [h] is exact if time would be in min...
I am surely not the first one to notice though…
Yes, correct - simply a wrong printf statement. And "known" already, fixed in the development tree...
3) While trying to understand (1) I looked at the performance of the multiprocessor use:
Using MPI I got the following timings for 1E8 trajectories
ncores time (min)
1 5.333
2 2.95
3 2.9166
4 2.3333
a system monitor shows that the requested number of cores are running at 100% for the total time.
I do not have a large disk I/O activity.
>From what I read in the manual I was expecting a very good scalability…
My impression still is that it normally scales very good… No other intensive tasks running at that time I guess - and your processor cores are all proper cores? I.e. no hyperthreading etc?
4) More a question then a bug: when using MPI wavelength vs tof plots show a wrong pattern (when running single processor the pattern is ok). I noticed the warning concerning the use of the auto scale parameter together with MPI but I remember a discussion with Emmanuel Fahri where he mentioned that the errors in the outputs should be negligible... I put two images in attach to illustrate the difference I get. Both simulations are the same first with single machine, second on 4 procs.
Is this kind of behaviour really expected?
I have not yet tried to use fixed parameters as this is rather un-practical…
Hmm. This looks very strange indeed! Please send me the instrument file and relevant parameters… You are right that "auto limits" are problematic when using MPI - this is very likely the reason! We are considering to do this differently by keeping an event list and only doing binning at the end of the simulation at the master node...
5) last and least: minor but annoying for automatic analysis of the simulation files...
mcstas.sim (sometimes) has a wrong date field based on EPOCH :
Date: Simulation started (0) Thu Jan 1 01:00:00 1970
Only sometimes? Strange… I would have guessed that problem to be system-dependent and "either or"… Please let me know if you find any further indications about when this happens? :-)
I am running McStas 1.12c under Mint13 using opne MPI.
Thanks a lot for reading down to this point and possibly answering ;0)
No problem, it's what we're here for? :-)
Best,
Peter
Peter Kjær Willendrup
Development engineer
DTU Physics
Technical University of Denmark
[cid:image002.gif at 01CCCAF1.5E6331F0]
Department of Physics
Fysikvej
Building 307
DK-2800 Kongens Lyngby
Direct +45 2125 4612
Mobil +45 2125 4612
Fax +45 4593 2399
pkwi at fysik.dtu.dk<mailto:pkwi at fysik.dtu.dk>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman2.mcstas.org/pipermail/mcstas-users/attachments/20121010/bcdc2253/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 58 bytes
Desc: image001.gif
URL: <http://mailman2.mcstas.org/pipermail/mcstas-users/attachments/20121010/bcdc2253/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.gif
Type: image/gif
Size: 1055 bytes
Desc: image002.gif
URL: <http://mailman2.mcstas.org/pipermail/mcstas-users/attachments/20121010/bcdc2253/attachment-0001.gif>
More information about the mcstas-users
mailing list