CLM-Community

Dear Delei, I never encountered this kind of variablilty (4 different outputs) with the same setup on the same computer. I often compare different model version (when we change something) and from this I know that even after 30h simulations the difference of output fields is zero (at least in ncview) - at least when I ran it on the same computer. [ if you wanne track down the problem ] are the non-NAN values of the 4 simulations the same? -> if no, I guess it's a wierd chaos effect that causes the crash to happend differently. And I would try to get a stable setup that doesn´t produce this chaos effect anymore... (using the same node of the supercomputer for example - if the nodes have different setups) -> if yes, I have no Idea, and would start suspecting a faulty hardware causing random NAN. -> using only 1 cpu/core (procx=1,procy=1) may be a setup that you could also test. As far as I know, this should not change the simulation output. But I had a case (with INT2LM), where a bug occured only if parallel computing was used. [ if you want a workaround ] Do you have any working setup right now, from which you could deviate slowly towards the setup you need... step by step, always checking when the error occurs? Cheers Rolf

#da0f3fa

@rolfzentek in #da0f3fa
 Dear Delei, I never encountered this kind of variablilty (4 different outputs) with the same setup on the same computer. I often compare different model version (when we change something) and from this I know that even after 30h simulations the difference of output fields is zero (at least in ncview) - at least when I ran it on the same computer. [ if you wanne track down the problem ] are the non-NAN values of the 4 simulations the same? -> if no, I guess it's a wierd chaos effect that causes the crash to happend differently. And I would try to get a stable setup that doesn´t produce this chaos effect anymore... (using the same node of the supercomputer for example - if the nodes have different setups) -> if yes, I have no Idea, and would start suspecting a faulty hardware causing random NAN. -> using only 1 cpu/core (procx=1,procy=1) may be a setup that you could also test. As far as I know, this should not change the simulation output. But I had a case (with INT2LM), where a bug occured only if parallel computing was used. [ if you want a workaround ] Do you have any working setup right now, from which you could deviate slowly towards the setup you need... step by step, always checking when the error occurs? Cheers Rolf

CCLM failed with error code 1004 – in #9: CCLM

in #9: CCLM

Cookies disclaimer

Unread

Cookies disclaimer