c*******t 发帖数: 32 | 1 程序要求能快速的从硬盘读取上万个文件,
并把文件内容放在database中。
如果用循环逐个打开文件读就太慢了。不知有什么好办法?
多谢。 |
xt 发帖数: 17532 | 2
既然这样太慢那就没有办法了.我想不出有更快的办法
【在 c*******t 的大作中提到】 : 程序要求能快速的从硬盘读取上万个文件, : 并把文件内容放在database中。 : 如果用循环逐个打开文件读就太慢了。不知有什么好办法? : 多谢。
|
c*******t 发帖数: 32 | 3 用多线程会不会快一点?
【在 xt 的大作中提到】 : : 既然这样太慢那就没有办法了.我想不出有更快的办法
|
xt 发帖数: 17532 | 4
可能吧.不好说
【在 c*******t 的大作中提到】 : 用多线程会不会快一点?
|
e***g 发帖数: 158 | 5 not likely, this is IO bound.
【在 xt 的大作中提到】 : : 可能吧.不好说
|
m******t 发帖数: 2416 | 6 If his "database" doesn't happen to be another file located
on the same harddrive, I think multithread would improve the
performance for a lot. It would take some experiment to find
an optimal number of "worker thread" though.
【在 e***g 的大作中提到】 : not likely, this is IO bound.
|
r*****s 发帖数: 985 | 7 the bottleneck here is the HD and file system. Even if you
read the files sequentially, it won't be much different from
the multithread solutions, as it is IO bound only. Multithread
works only if it is IO+CPU bound.
therefore, you might need high performance file system, such as
IBM GPFS ...
【在 m******t 的大作中提到】 : If his "database" doesn't happen to be another file located : on the same harddrive, I think multithread would improve the : performance for a lot. It would take some experiment to find : an optimal number of "worker thread" though.
|
xt 发帖数: 17532 | 8
A SCSI will be good enough to handle that
【在 r*****s 的大作中提到】 : the bottleneck here is the HD and file system. Even if you : read the files sequentially, it won't be much different from : the multithread solutions, as it is IO bound only. Multithread : works only if it is IO+CPU bound. : therefore, you might need high performance file system, such as : IBM GPFS ...
|
e***g 发帖数: 158 | 9 in that case, typical producer/consumer, 2 thread should be enough
with a queue in between. more threads writing to database will cause
unnecessary concurrency control to already busy database.
【在 m******t 的大作中提到】 : If his "database" doesn't happen to be another file located : on the same harddrive, I think multithread would improve the : performance for a lot. It would take some experiment to find : an optimal number of "worker thread" though.
|
m******t 发帖数: 2416 | 10
It's not the only bottleneck. Another potential bottleneck would
be the DB+network roundtrip. A multi-thread design would allow
the application to do DB and local I/O concurrently (again, assuming
the DB is not local).
Also, before knowing more about the details of the OP application,
it's not unusual that some processing does happen to the data
once it's read into the memory. A multi-thread design would also
allow the application to improve its CPU utlization in this case.
【在 r*****s 的大作中提到】 : the bottleneck here is the HD and file system. Even if you : read the files sequentially, it won't be much different from : the multithread solutions, as it is IO bound only. Multithread : works only if it is IO+CPU bound. : therefore, you might need high performance file system, such as : IBM GPFS ...
|
m******t 发帖数: 2416 | 11
Well it depends. If the data is written to different tables,
or different pages in the same table, most modern database
products have very sophisticated concurrency support to avoid resource
competing.
【在 e***g 的大作中提到】 : in that case, typical producer/consumer, 2 thread should be enough : with a queue in between. more threads writing to database will cause : unnecessary concurrency control to already busy database.
|
c********e 发帖数: 383 | 12
1st on the server side, trust your database and let it to optimization
2nd, on the client(ur)side, if networking is really the bottle next,
asynchronous handling could be a good measure.
【在 m******t 的大作中提到】 : : Well it depends. If the data is written to different tables, : or different pages in the same table, most modern database : products have very sophisticated concurrency support to avoid resource : competing.
|