p*******n 发帖数: 273 | 1 I changed the loop in a serial code by MPI :
for (i=mynode;i<100;i=i+numprocs)
{ ...
MPI_barrier(MPI_COMM_WORLD);
...
}
MPI_barrier(MPI_COMM_WORLD);
I found that if 100 can not be divided by numprocs, there will be an error
in run time. If I comment MPI_barrier inside the loop, everything is OK.
Anyone know what is wrong with my usage of MPI_Barrier? Thank you! | j**u 发帖数: 6059 | 2 MPI_Barrier is used to make sure all procs have reached at the same time
point (synchronized). I am not sure what the loop index i is. If it is the
rank of your node, it is wrong because all procs are running the same codes.
When the first node is waiting other nodes to reach the line of MPI_Barrier
to complet its loop, the other nodes are actually waiting for the first
node to complet the first loop. Hence, the program will stop at the point
for ever.
If loop index i isn't, I might need more in
【在 p*******n 的大作中提到】 : I changed the loop in a serial code by MPI : : for (i=mynode;i<100;i=i+numprocs) : { ... : MPI_barrier(MPI_COMM_WORLD); : ... : } : MPI_barrier(MPI_COMM_WORLD); : I found that if 100 can not be divided by numprocs, there will be an error : in run time. If I comment MPI_barrier inside the loop, everything is OK. : Anyone know what is wrong with my usage of MPI_Barrier? Thank you!
| p*******n 发帖数: 273 | 3 mynode 是 node的identifier
numprocs 是node的总数.
在开始的串行程序中, loop按照i循环 100次
现在我想把loop循环简单的分拆开来. 但是有个balance的问题. 如果 node 数目不能
被100整除, 就会出问题.
循环中的barrier 还是要的. 每次循环主节点要读一些数据,然后bcast给所有的nodes.
但是好像和循环外面的barrier冲突了,成了deadlock了. 不知道这种情况怎么解决.
多谢拉
codes.
Barrier
【在 j**u 的大作中提到】 : MPI_Barrier is used to make sure all procs have reached at the same time : point (synchronized). I am not sure what the loop index i is. If it is the : rank of your node, it is wrong because all procs are running the same codes. : When the first node is waiting other nodes to reach the line of MPI_Barrier : to complet its loop, the other nodes are actually waiting for the first : node to complet the first loop. Hence, the program will stop at the point : for ever. : If loop index i isn't, I might need more in
| j**u 发帖数: 6059 | 4 I guess I might have known what you are trying to do. It seems you have to
repeat sth 100 times in your serial code. In your parallel code, you just
simply split the whole loop into many strips and each strip contains
repetitions with the same number of the procs. If it is this case, you
should make sure all nodes have same number of calls to MPI_Barrier(). In
you origin code, those nodes which deals with residual loop will have one
more call to MPI_Barrier(). One possible try could be as follow
【在 p*******n 的大作中提到】 : mynode 是 node的identifier : numprocs 是node的总数. : 在开始的串行程序中, loop按照i循环 100次 : 现在我想把loop循环简单的分拆开来. 但是有个balance的问题. 如果 node 数目不能 : 被100整除, 就会出问题. : 循环中的barrier 还是要的. 每次循环主节点要读一些数据,然后bcast给所有的nodes. : 但是好像和循环外面的barrier冲突了,成了deadlock了. 不知道这种情况怎么解决. : 多谢拉 : : codes.
| p*******n 发帖数: 273 | 5 That makes sense.
Thank you.
【在 j**u 的大作中提到】 : I guess I might have known what you are trying to do. It seems you have to : repeat sth 100 times in your serial code. In your parallel code, you just : simply split the whole loop into many strips and each strip contains : repetitions with the same number of the procs. If it is this case, you : should make sure all nodes have same number of calls to MPI_Barrier(). In : you origin code, those nodes which deals with residual loop will have one : more call to MPI_Barrier(). One possible try could be as follow
|
|