[mcstas-users] McStas/MPI OFF file problem. iFit execution/scans.
Петр Коник
104pet104 at gmail.com
Tue Feb 12 12:12:57 CET 2019
Thank you very much for such a rapid help! I will test it in a couple
of days and provide feedback.
For the last question - yes of course you may freely use it (one note
- PIK is usually written with I)
Best regards,
Peter Konik
вт, 12 февр. 2019 г. в 14:05, Emmanuel FARHI <farhi at ill.fr>:
>
> Hi Peter,
>
> I had a look at the 'stack smashing detected' error you reported (below) when using:
>
> instrument screw_n
> Guide_anyshape component with large OFF (nice twisted guide made of triangles)
> mpi with 24 cores
>
> I can reproduce the bug. I find that wether it is controlled from iFit or from the command line with mcrun does not make any difference. The bug occurs in both usages, even with very few neutron counts (1e4), in the middle of the TRACE (not at final data merge).
>
> The reported error is:
>
> *** stack smashing detected ***: <unknown> terminated
>
> and it mostly takes place at the 'screw' component which is the Guide_anyshape(OFF). It triggers a SIGTERM or SIGABRT which usually saves results but stops further neutron events (partial computation). But it also involves the Octo_sm monitor (Monitor_nD).
>
> McStas 2.5 test (Ubuntu 18.04)
>
> When executed from iFit, the error also shows as:
>
> # Fatal : unrecoverable loop ! Suicide (naughty boy).
> [warn] Epoll ADD(4) on fd 37 failed. Old events were 0; read change was 0 (none); write change was 1 (add): Bad file descriptor
> [farhimacpro:05429] pmix_usock_msg_send_bytes: write failed: Broken pipe (32) [sd = 112]
>
> This is clearly related to openMPI, and more specifically to libevent.
>
> I have executed the same with a serial computation, and it all goes OK, even when compiled with MPI, and executed with --mpi=1.
>
> I have no idea why this happens, as Guide_anyshape has no I/O routine except at the start (read the OFF), but the issue comes in the middle of the TRACE, in connection with Octo_sm...
>
> Conclusion
>
> I have no idea what the error is, as of today.
>
> Fallback solutions
>
> In all cases, make sure the 'L' value is larger than the Guide_anyshape length (OFF 'width').
>
> Solution 1: Removing the Octo_sm component seems to solve the issue. So this is a possibility.
>
> Solution 2: (recommended) You can as well replace it with e.g.:
>
> COMPONENT Octo_sm = Divergence_monitor(xwidth = h, yheight = w,
> nh=100, nv=100, maxdiv_h=10, maxdiv_v=0.1, filename="Octo_sm.dat")
> AT (0, 0, L+0.04) RELATIVE arm_geks
>
> Solution 3: An other solution is serial execution which works, as far as I have seen. You can force iFit/mccode to work in serial with:
>
> m=mccode('screw_n.instr','mpi=1'); % at creation
>
> or
>
> m=mccode('screw_n.instr');
> m.UserData.options.mpi=1; % after creation
>
> Then you can run parameter scans and optimisations. It is just longer.
>
> Final question
>
> Would you accept if I add one of your OFF twisted guides into McStas/Data, as well as the PYK reactor source parameters in Source_gen doc ?
>
> Emmanuel.
>
> On 2/11/19 6:53 PM, farhi wrote:
>
> Hello Peter,
>
> Would you be so kind to send me your H3 McStas model, as well as the 15.off file so that I can nail down the error ?
>
> Emmanuel.
>
> Le 2019-02-11 17:29, Петр Коник a écrit :
>
> Dear all!
>
> We currently try to use a quiet large .off file to finely represent a
> complex geometry guide. While mcstas itself works well (if executed
> from the mcgui), ifit script, which we use to scan parameters,
> irregularly crash. Here is the error - it looks like it is some
> trouble with the array allocated sizes. Any ideas how to fix that?
>
> Sorry for large text - we really don't know which part is important.
>
> Best regards,
> Peter Konik
>
> mpirun -n 8 /tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/screw_n.out
> --ncount=1000000 --dir=/tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/sim
> L=15 guide_m=6 lambda=2
> mpirun -n 8 /tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/screw_n.out
> --ncount=1000000 --dir=/tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/sim
> L=15 guide_m=6 lambda=2: Quit
> Simulation 'H3'
> (/tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/screw_n.instr): running
> on 8 nodes (master is 'konik-N46JV', MPI version 3.1).
> [H3] Initialize
> [H3] Initialize
> [H3] Initialize
> [H3] Initialize
> Loading geometry file (OFF/PLY): data/length/extra_fine/15.off
> Number of vertices: 5694
> [H3] Initialize
> [H3] Initialize
> [H3] Initialize
> [H3] Initialize
> Number of polygons: 11366
> Warning: Neither xwidth, yheight or zdepth are defined.
> The file-defined (non-scaled) geometry the OFF geometry
> data/length/extra_fine/15.off will be applied!
> Bounding box dimensions for geometry data/length/extra_fine/15.off:
> Length=0.150000 (100.000%)
> Width= 15.000000 (100.000%)
> Depth= 0.150000 (100.000%)
> *** stack smashing detected ***: <unknown> terminated
>
> # McStas 2.5 - Dec. 12, 2018: [pid 13264] Signal 6 detected [proc 7]
> SIGABRT (Abort)
> # Simulation: H3 (/tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/screw_n.instr)
> # Breakpoint: Octo_sm (Trace) 6.47 % ( 8089.0/ 125000.0)
> # Date: Mon Feb 11 19:13:44 2019
> # Started: Mon Feb 11 19:13:43 2019
> # Last I/O Error: Function not implemented
> # McStas 2.5 - Dec. 12, 2018: Simulation stop (abort).
>
> # McStas 2.5 - Dec. 12, 2018: [pid 13258] Signal 15 detected [proc 6]
> SIGTERM (Termination)
> # Simulation: H3 (/tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/screw_n.instr)
> # Breakpoint: screw (Trace) 20.96 % ( 26205.0/ 125000.0)
> # Date: Mon Feb 11 19:13:45 2019
> # Started: Mon Feb 11 19:13:43 2019
> # McStas 2.5 - Dec. 12, 2018: Finishing simulation (save results and exit)
>
> # McStas 2.5 - Dec. 12, 2018: [pid 13255] Signal 15 detected [proc 5]
> SIGTERM (Termination)
> # Simulation: H3 (/tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/screw_n.instr)
> # Breakpoint: screw (Trace) 19.39 % ( 24240.0/ 125000.0)
> # Date: Mon Feb 11 19:13:45 2019
> # Started: Mon Feb 11 19:13:43 2019
> # McStas 2.5 - Dec. 12, 2018: Finishing simulation (save results and exit)
>
> Finally [H3: /tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/sim]. Time: 2 [s]
>
> # McStas 2.5 - Dec. 12, 2018: [pid 13248] Signal 15 detected [proc 2]
> SIGTERM (Termination)
> # Simulation: H3 (/tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/screw_n.instr)
> # Breakpoint: screw (Trace) 19.55 % ( 24443.0/ 125000.0)
> # Date: Mon Feb 11 19:13:45 2019
> # Started: Mon Feb 11 19:13:43 2019
> # McStas 2.5 - Dec. 12, 2018: Finishing simulation (save results and exit)
>
> Finally [H3: /tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/sim]. Time: 2 [s]
>
> # McStas 2.5 - Dec. 12, 2018: [pid 13247] Signal 15 detected [proc 1]
> SIGTERM (Termination)
> # Simulation: H3 (/tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/screw_n.instr)
> # Breakpoint: screw (Trace) 20.48 % ( 25600.0/ 125000.0)
> # Date: Mon Feb 11 19:13:45 2019
> # Started: Mon Feb 11 19:13:43 2019
> # McStas 2.5 - Dec. 12, 2018: Finishing simulation (save results and exit)
>
> Finally [H3: /tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/sim]. Time: 2 [s]
>
> # McStas 2.5 - Dec. 12, 2018: [pid 13250] Signal 15 detected [proc 4]
> SIGTERM (Termination)
> # Simulation: H3 (/tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/screw_n.instr)
> # Breakpoint: screw (Trace) 20.61 % ( 25768.0/ 125000.0)
> # Date: Mon Feb 11 19:13:45 2019
> # Started: Mon Feb 11 19:13:43 2019
> # McStas 2.5 - Dec. 12, 2018: Finishing simulation (save results and exit)
>
> # McStas 2.5 - Dec. 12, 2018: [pid 13249] Signal 15 detected [proc 3]
> SIGTERM (Termination)
> # Simulation: H3 (/tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/screw_n.instr)
> # Breakpoint: screw (Trace) 26.14 % ( 32672.0/ 125000.0)
> # Date: Mon Feb 11 19:13:45 2019
> # Started: Mon Feb 11 19:13:43 2019
> # McStas 2.5 - Dec. 12, 2018: Finishing simulation (save results and exit)
>
> Finally [H3: /tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/sim]. Time: 2 [s]
>
> # McStas 2.5 - Dec. 12, 2018: [pid 13246] Signal 15 detected [proc 0]
> SIGTERM (Termination)
> # Simulation: H3 (/tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/screw_n.instr)
> # Breakpoint: screw (Trace) 20.70 % ( 25879.0/ 125000.0)
> # Date: Mon Feb 11 19:13:45 2019
> # Started: Mon Feb 11 19:13:43 2019
> # McStas 2.5 - Dec. 12, 2018: Finishing simulation (save results and exit)
>
> Save [H3]
> [warn] Epoll ADD(4) on fd 49 failed. Old events were 0; read change
> was 0 (none); write change was 1 (add): Bad file descriptor
> [warn] Epoll ADD(4) on fd 43 failed. Old events were 0; read change
> was 0 (none); write change was 1 (add): Bad file descriptor
> --------------------------------------------------------------------------
> mpirun noticed that process rank 7 with PID 0 on node konik-N46JV
> exited on signal 3 (Quit).
> --------------------------------------------------------------------------
>
> Error: Could not evaluate Expression in model screw_n.instr McCode
> [mccode] iF495774
> self = iFunc_McCode (methods,doc,plot,code) 1D model: "screw_n.instr
> McCode [mccode]"
> Expression: UD = this.UserData; options=UD.options;if
> ~isempty(options.dir) && ...
> Description: McCode virtual experiment screw_n.instr
> Set UserData.options.monitor to specify a given monitor file pattern,
> or [] to get the last.
> Monitors are stored in UserData.monitors
> Available monitors:
> * Octo_large
> * Octo_med
> * Octo_sm
> iData iD500632=load(iData,... [100 1] Intensity [n/s/bin](x)
> "mccode.sim McCode sim file I=1.02353e+09 I_err=1.02353e+09 N=1
> X0=0.0131313 dX=0;"
> </tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/sim/Octo_sm_1549901622.vd>
> Axis 1 "x" label is "Vert. Divergence [deg]", range [-0.1:0.1]
> Tag: 'iF495774'
> Date: '11-Feb-2019 19:12:49'
> Name: 'screw_n.instr McCode [mccode]'
> Parameters: {3×1 cell}
> Guess: [10 6 5]
> Constraint: [1×1 struct]
> Dimension: 1
> ParameterValues: [3×1 double]
> UserData: [1×1 struct]
> Duration: 0.7051
> class: 'iFunc_McCode'
>
> Parameters (3):
> p( 1)= L=15
> p( 2)= guide_m=6
> p( 3)= lambda=2
> Other Parameters:
> ''
> 'UD = this.UserData; options=UD.options;'
> 'if ~isempty(options.dir) && isempty(dir(options.dir));'
> 'try; mkdir(options.dir); end;'
> 'end;'
> 'if isempty(options.dir) || isempty(dir(options.dir));'
> 'options.dir=tempname;'
> 'mkdir(options.dir);'
> 'options.use_tmpdir = true;'
> 'else options.use_tmpdir = false;'
>
> ...
> 'ax='x,y,z,t';'
> 'nd=exist('t')+exist('z')+exist('y')+exist('x');'
> 'if min(nd,this.Dimension)>0, ax=eval([ '{'
> ax(1:(2*min(nd,this.Dimension))) '}']); else ax={}; end;'
> 'if ~isempty(ax) && exist('x') && ~isempty(x) &&
> ~all(isnan(x(:))), signal = interp(signal, ax{:});'
> 'else x=getaxis(signal,1); y=getaxis(signal,2);
> z=getaxis(signal,3); t=getaxis(signal,4); end;'
> 'end;'
>
> Name Size Bytes Class Attributes
>
> ME 1x1 3878 MException
> UD 1x1 1029966 struct
> cmd 1x163 326 char
> duration 1x1 8 double
> f 0x1 0 cell
> iFunc_ax 1x14 28 char
> iFunc_dim 1x1 8 double
> iFunc_t0 1x6 48 double
> iFunc_this 1x1 1070328 iFunc_McCode
> index 0x0 0 double
> options 1x1 2461 struct
> p 1x3 24 double
> result 1x4667 9334 char
> signal 0x0 0 double
> status 1x1 8 double
> struct_p 1x1 552 struct
> this 1x1 1070328 iFunc_McCode
> varargin 1x0 0 cell
> x 51x1 408 double
>
> Error using iFunc/feval>iFunc_feval_expr (line 414)
> Model screw_n.instr McCode [mccode] iF495774 failed to execute mpirun
> -n 8 /tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/screw_n.out
> --ncount=1000000 --dir=/tmp/tp538a3bb9_21a0_4166_b4ea_597a15889e55/sim
> L=15 guide_m=6 lambda=2
> iFunc:feval: Saved state in
> /media/konik/a09e43d5-7f1d-47d7-b0ae-2df53071d43b/JOB/Octagon-guide/Octagon-guide/simulation/iFunc_feval_error
> Error using iFunc/feval>iFunc_feval_expr (line 432)
> Failed model evaluation. Saved state in
> /media/konik/a09e43d5-7f1d-47d7-b0ae-2df53071d43b/JOB/Octagon-guide/Octagon-guide/simulation/iFunc_feval_error
>
> Error in iFunc/feval (line 344)
> [signal,ax,p,model,duration] = iFunc_feval_expr(model, varargin{:});
>
> Error in iFunc_McCode/feval (line 139)
> [signal, self, ax, name] = feval at iFunc(self, varargin{:});
>
> Error in iData>iData_iFunc2iData (line 268)
> [signals, this_in, axs, names] = feval(this_in, varargin{:});
>
> Error in iData (line 189)
> [this_out, this_in] = iData_iFunc2iData(this_in, axes_in,
> varargin{2:end});
>
> Error in screw_length_scan (line 42)
> results = iData(model,parameters);
> _______________________________________________
> ifit-users mailing list
> ifit-users at mccode.org
> https://mailman2.mccode.org/mailman/listinfo/ifit-users
>
>
> --
> Emmanuel FARHI, \|/ ____ \|/
> Spectroscopy Group Institut Laue-Langevin (ILL) Grenoble ~@-/ oO \-@~
> 71 av des Martyrs,CS 20156,38042 Grenoble Cedex 9,France /_( \__/ )_\
> Work :Tel (33/0) 4 76 20 71 35. Fax (33/0) 4 76 48 39 06 \__U_/
More information about the mcstas-users
mailing list