r***e 发帖数: 2000 | 1 Sorry for this stupid question.
A very simple/short MPI program runs on a 4 core
desktop computer. I would expect the Speed-up to
drop after 8 (HT) threads, but it continues almost
in a straight line until 128 threads (the default
system limit).
How is this possible? Thanks. |
j**u 发帖数: 6059 | 2 if there are not much communication between cores,
【在 r***e 的大作中提到】 : Sorry for this stupid question. : A very simple/short MPI program runs on a 4 core : desktop computer. I would expect the Speed-up to : drop after 8 (HT) threads, but it continues almost : in a straight line until 128 threads (the default : system limit). : How is this possible? Thanks.
|
r***e 发帖数: 2000 | 3
I still don't get it.
On a 4 core i7 processor with 8 hyper-threads,
say for an embarrassingly parallel problem
the sequential time is 1024 seconds
then 2 cores will take 512 seconds each, S = 2;
4 cores will take 256 seconds each, s = 4;
8 cores will take 128 seconds each, s = 8;
16 cores will take 64 seconds each,
however, as 8 have to wait,
it still takes 64+64 = 128 seconds, s = 8;
32 cores, s = 8;
But what I am getting is this:
【在 j**u 的大作中提到】 : if there are not much communication between cores,
|
r***e 发帖数: 2000 | 4
Can someone help please? I am really confused.
I pasted the source code here:
http://pastebin.com/5rUN5Vfm
【在 r***e 的大作中提到】 : : I still don't get it. : On a 4 core i7 processor with 8 hyper-threads, : say for an embarrassingly parallel problem : the sequential time is 1024 seconds : then 2 cores will take 512 seconds each, S = 2; : 4 cores will take 256 seconds each, s = 4; : 8 cores will take 128 seconds each, s = 8; : 16 cores will take 64 seconds each, : however, as 8 have to wait,
|
x*x 发帖数: 365 | 5 你的计时方法不对,在开始计时和开始计算之间加一个Barrier再看看。 |
r***e 发帖数: 2000 | 6
Thank you for looking into this, I add a Barrier
immediately after the start of the clock, and the
result is the same.
【在 x*x 的大作中提到】 : 你的计时方法不对,在开始计时和开始计算之间加一个Barrier再看看。
|
x*x 发帖数: 365 | 7 你的工作量的分配也是不对的,不管循环中num_nodes是多少,所有的rank(0-world_
size-1)都在进行同样的计算,结果是总工作量与num_nodes成反比,自然你会看到线性
加速了。正确的做法是只有rank 0 到rank num_nodes-1参加计算,这样才能维持总计
算量不变。 |
r***e 发帖数: 2000 | 8
没有啊,每个node的工作量是(n/num_nodes)。
【在 x*x 的大作中提到】 : 你的工作量的分配也是不对的,不管循环中num_nodes是多少,所有的rank(0-world_ : size-1)都在进行同样的计算,结果是总工作量与num_nodes成反比,自然你会看到线性 : 加速了。正确的做法是只有rank 0 到rank num_nodes-1参加计算,这样才能维持总计 : 算量不变。
|
x*x 发帖数: 365 | 9 可是不止num_nodes个node在工作啊
【在 r***e 的大作中提到】 : : 没有啊,每个node的工作量是(n/num_nodes)。
|
r***e 发帖数: 2000 | 10
明白了,谢谢!
【在 x*x 的大作中提到】 : 可是不止num_nodes个node在工作啊
|
r***e 发帖数: 2000 | 11 Old bear, may I bother you with another naïve question?
I tried to isolate a problem, so I wrote this short program to
test basic send and receive.
http://pastebin.com/CMp63hkK
It works as expected on one computer Fedora 22 with OpenMPI,
but "always" hangs on another computer with Fedora 22 with MPICH.
I tested them on localhost only in both cases, same gcc version.
If I use reduce or if I avoid send/receive same node (0),
then it works.
Is it a rule that I can't send/receive the same node (0),
or is there a mistake in my code?
Thank you! |
x*x 发帖数: 365 | 12 MPI_Send has blocking semantics. It may or may not block depending on the
MPI implementation.Both OpenMPI and MPICH behaved correctly.The problem is
with the program.
The correct way is to use MPI_Isend instead.
【在 r***e 的大作中提到】 : Old bear, may I bother you with another naïve question? : I tried to isolate a problem, so I wrote this short program to : test basic send and receive. : http://pastebin.com/CMp63hkK : It works as expected on one computer Fedora 22 with OpenMPI, : but "always" hangs on another computer with Fedora 22 with MPICH. : I tested them on localhost only in both cases, same gcc version. : If I use reduce or if I avoid send/receive same node (0), : then it works. : Is it a rule that I can't send/receive the same node (0),
|
r***e 发帖数: 2000 | 13
谢谢!
【在 x*x 的大作中提到】 : MPI_Send has blocking semantics. It may or may not block depending on the : MPI implementation.Both OpenMPI and MPICH behaved correctly.The problem is : with the program. : The correct way is to use MPI_Isend instead.
|