k****i 发帖数: 128 | 1 hbase/big table/cassandra 单机上存储都是memtable和sst
同样leveldb和rocksdb这种embedded也是memtable和sst,但是leveldb在内存中有
index,而以上的nosql db都没有,所以read 要check所有的memtable和sst(一般用
bloomfilter优化). 为什么不能维护个index呢? |
w**z 发帖数: 8232 | 2 太占内存。
【在 k****i 的大作中提到】 : hbase/big table/cassandra 单机上存储都是memtable和sst : 同样leveldb和rocksdb这种embedded也是memtable和sst,但是leveldb在内存中有 : index,而以上的nosql db都没有,所以read 要check所有的memtable和sst(一般用 : bloomfilter优化). 为什么不能维护个index呢?
|
k****i 发帖数: 128 | 3 那leveldb为什么要加个index?
【在 w**z 的大作中提到】 : 太占内存。
|
w**z 发帖数: 8232 | 4 I don't know much of leveldb, I am only familiar with Cassandra. There are
different knobs you can turn in Cassandra:
Key Cache
Row Cache
BloomFilter
index_interval within the SSTable (It does have index with in the SSTable to
speed up the lookup)
It's the trade off between memory consumption and read performance. For
Cassandra, you don't want to use more than 8G of Heap to avoid long GC pause
. Cassandra is built in a way that horizontal scale is easy and efficient.
For individual node, normally it holds less than 1T of data. Of course, it
depends on your individual use cases.
【在 k****i 的大作中提到】 : 那leveldb为什么要加个index?
|
p*****2 发帖数: 21240 | 5 nosql不是一种数据库 不能这么比
【在 k****i 的大作中提到】 : 那leveldb为什么要加个index?
|
w**z 发帖数: 8232 | 6 Another thing to add, if Cassandra node compaction is not behind too much,
the number of SSTables for each read operation should be less than 5 or 3.
It also depends on the compaction strategy. So bloomfilter + sstable index
are sufficient enough.
to
pause
【在 w**z 的大作中提到】 : I don't know much of leveldb, I am only familiar with Cassandra. There are : different knobs you can turn in Cassandra: : Key Cache : Row Cache : BloomFilter : index_interval within the SSTable (It does have index with in the SSTable to : speed up the lookup) : It's the trade off between memory consumption and read performance. For : Cassandra, you don't want to use more than 8G of Heap to avoid long GC pause : . Cassandra is built in a way that horizontal scale is easy and efficient.
|
B*****g 发帖数: 34098 | 7 大妞在datastax?
【在 w**z 的大作中提到】 : Another thing to add, if Cassandra node compaction is not behind too much, : the number of SSTables for each read operation should be less than 5 or 3. : It also depends on the compaction strategy. So bloomfilter + sstable index : are sufficient enough. : : to : pause
|
k****i 发帖数: 128 | 8 意思是这种基于LSM idea的nosql
【在 p*****2 的大作中提到】 : nosql不是一种数据库 不能这么比
|
k****i 发帖数: 128 | 9 5 or 3 disk read for each read request can be a huge number
【在 w**z 的大作中提到】 : Another thing to add, if Cassandra node compaction is not behind too much, : the number of SSTables for each read operation should be less than 5 or 3. : It also depends on the compaction strategy. So bloomfilter + sstable index : are sufficient enough. : : to : pause
|
k****i 发帖数: 128 | 10 if we have index for each SSTable, why not put it as a whole. what the
memory consumption difference between these two scenarios?
to
pause
【在 w**z 的大作中提到】 : I don't know much of leveldb, I am only familiar with Cassandra. There are : different knobs you can turn in Cassandra: : Key Cache : Row Cache : BloomFilter : index_interval within the SSTable (It does have index with in the SSTable to : speed up the lookup) : It's the trade off between memory consumption and read performance. For : Cassandra, you don't want to use more than 8G of Heap to avoid long GC pause : . Cassandra is built in a way that horizontal scale is easy and efficient.
|
|
|
w**z 发帖数: 8232 | 11 You can have keycache which typically will be 10% of your heap. 800M can
store a lot of keys.
For us, the 99 percentile read < 10ms. 50 percentile < 3ms. If you are
looking for something subms, Cassandra is not for you.
【在 k****i 的大作中提到】 : 5 or 3 disk read for each read request can be a huge number
|
w**z 发帖数: 8232 | 12 It doesn't index the whole SSTable, it has index interval of 128 by default
which is tunable. For reads, it might access more than one SSTables which
can be optimized using Bloomfilter. If you combine the indexes from
different SSTable in the memory, it makes it very complicated. Remember
SSTables can be compacted during compaction. Maintaining a global indices is
not practical.
【在 k****i 的大作中提到】 : if we have index for each SSTable, why not put it as a whole. what the : memory consumption difference between these two scenarios? : : to : pause
|
w**z 发帖数: 8232 | 13 不是。 只是工作需要,built a few Cassandra Clusters from scratch. 完事后,还
被逼着maintain 它们。
【在 B*****g 的大作中提到】 : 大妞在datastax?
|
c******o 发帖数: 1277 | 14 我觉得Cassandra maintain 起来比 Mongodb 容易多了。
【在 w**z 的大作中提到】 : 不是。 只是工作需要,built a few Cassandra Clusters from scratch. 完事后,还 : 被逼着maintain 它们。
|
g*****g 发帖数: 34805 | 15 所有p2p的都相对容易。
【在 c******o 的大作中提到】 : 我觉得Cassandra maintain 起来比 Mongodb 容易多了。
|
w**z 发帖数: 8232 | 16 It's OK in most of cases. But there were a few dramatic moments when doing
repair.
【在 c******o 的大作中提到】 : 我觉得Cassandra maintain 起来比 Mongodb 容易多了。
|
p*****2 发帖数: 21240 | 17 mongo麻烦在哪里
【在 c******o 的大作中提到】 : 我觉得Cassandra maintain 起来比 Mongodb 容易多了。
|