Cassandra 里的 partition - Programming版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Programming版 - Cassandra 里的 partition

相关主题
● 有人set up过多个node的Cassandra 么？ (转载)	● Re: 请教一道题目
● Cassandra returns null row keys?	● 请问这道题怎么解决？
● wwzz来讲讲cassandra吧	● [合集] 一个链表倒转的问题
● node最好的关系型数据库是什么	● C++如何实现graph？
● 大牛讲讲Cassandra client	● C++: What is the difference between the two approaches?
● 现在最成熟稳定的websocket server是什么？	● Re: 问Zhaoce个问题 (转载)
● consistent hashing实际应用	● 请大牛来谈谈对Solr的看法
● 请教一个 F的message设计问题，不能理解	● Why You Should Never Use MongoDB

相关话题的讨论汇总
话题: partition话题: cassandra话题: node话题: column话题: table

进入Programming版参与讨论

(共1页)

B********r
发帖数: 397

最近弄cassandra,发现里面有个limit:
The maximum number of cells (rows x columns) in a single partition is 2
billion
链接: http://wiki.apache.org/cassandra/CassandraLimitations
其中让我疑惑的是partition这个概念,是说每个node里面可以有好几个partition么?
它是可以configure on the fly, 还是一开是设置就定死了的? 也就是说,如果我一开
是只有 1 partition per node, 当我的cells 超过2 billion就放不下了么?

p*****2
发帖数: 21240

一个partition应该就是一个node吧

【在 B********r 的大作中提到】

: 最近弄cassandra,发现里面有个limit:
: The maximum number of cells (rows x columns) in a single partition is 2
: billion
: 链接: http://wiki.apache.org/cassandra/CassandraLimitations
: 其中让我疑惑的是partition这个概念,是说每个node里面可以有好几个partition么?
: 它是可以configure on the fly, 还是一开是设置就定死了的? 也就是说,如果我一开
: 是只有 1 partition per node, 当我的cells 超过2 billion就放不下了么?

B********r
发帖数: 397

那一个column family 里面的 partition 的数目是跟整个cluster node 数目相同么,
还是自己设置? apache kafka 里面是可以自己设置partition number, cassandra 也
是一样么?
多谢!

【在 p*****2 的大作中提到】

: 一个partition应该就是一个node吧

p*****2
发帖数: 21240

,
我感觉现在都不叫CF了，叫table了。
我的理解多少个node就多少个partition。
All data for a single partition must fit (on disk) on a single machine in
the cluster. Because partition keys alone are used to determine the nodes
responsible for replicating their data, the amount of data associated with a
single key has this upper bound.

【在 B********r 的大作中提到】

:
: 那一个column family 里面的 partition 的数目是跟整个cluster node 数目相同么,
: 还是自己设置? apache kafka 里面是可以自己设置partition number, cassandra 也
: 是一样么?
: 多谢!

B********r
发帖数: 397

是说一个physical machine 有好几个nodes , 一个table也是分布在好几个physical
machine里面,所以如果数据超过node * limit 就加一个node就可以了 ?

【在 p*****2 的大作中提到】

:
: ,
: 我感觉现在都不叫CF了，叫table了。
: 我的理解多少个node就多少个partition。
: All data for a single partition must fit (on disk) on a single machine in
: the cluster. Because partition keys alone are used to determine the nodes
: responsible for replicating their data, the amount of data associated with a
: single key has this upper bound.

p*****2
发帖数: 21240

我觉得不考虑vnode的话，一个physical machine就是一个node。数据要是超过了，加
一台machine（node）就可以了。

【在 B********r 的大作中提到】

:
: 是说一个physical machine 有好几个nodes , 一个table也是分布在好几个physical
: machine里面,所以如果数据超过node * limit 就加一个node就可以了 ?

g*****g
发帖数: 34805

partition = nodes / replication factor. Every row can have 2 billion entries
. Storage will never run out, you can always add new nodes.

w**z
发帖数: 8232

就是一个row 最多有2b column.

【在 B********r 的大作中提到】

w**z
发帖数: 8232

in Cassandra, one row has to be in one node. you can replicate the whole row
, but each row has to fit in one physical node.

a

【在 p*****2 的大作中提到】

:
: 我觉得不考虑vnode的话，一个physical machine就是一个node。数据要是超过了，加
: 一台machine（node）就可以了。

w**z
发帖数: 8232

one physical node can hold multiple partitions data. one partition is
calculated by the partitioner, the others are the replicates. if RF =1, one
node, one partition.

【在 B********r 的大作中提到】

相关主题
● 现在最成熟稳定的websocket server是什么？	● Re: 请教一道题目
● consistent hashing实际应用	● 请问这道题怎么解决？
● 请教一个 F的message设计问题，不能理解	● [合集] 一个链表倒转的问题
进入Programming版参与讨论

B********r
发帖数: 397

谢了各位大神，还有个问题想讨论下，关于cassandra 2.0 之后thrift与CQL3 的区别
，是说只是client里面有差别还是server side也变了？ protocol应该是不变的吧？
之前我想动态的加一个column很方便，用cli的话就是： set table[key][columnName]
= columnValue
现在必须alter table，还是也可以用那种方法？

p*****2
发帖数: 21240

protocol变了现在是最流行的异步

columnName]

【在 B********r 的大作中提到】

: 谢了各位大神，还有个问题想讨论下，关于cassandra 2.0 之后thrift与CQL3 的区别
: ，是说只是client里面有差别还是server side也变了？ protocol应该是不变的吧？
: 之前我想动态的加一个column很方便，用cli的话就是： set table[key][columnName]
: = columnValue
: 现在必须alter table，还是也可以用那种方法？

w**z
发帖数: 8232

thrift也还支持。如果用cql要enable native protocol。但server 怎样存data是一样
的。我还倾向用CLI，明了。cql 有点不自然，明明是 column family, 非搞成table,
用起来要小心。schema 要想清楚了。

columnName]

【在 B********r 的大作中提到】

B********r
发帖数: 397

table,
是啊,还有一个大问题,就是用CQL3怎么动态的加column, 比如我定义了一个table只有
两个colum,那就不能直接
set keySpace[colFamily][colName] = value
而必须先add column 然后再 insert? 这样不是多了一个round trip么.
还有如果我要 select top 10 * from table order by id
貌似只能把所有的都返回,我在client side自己sort?

【在 w**z 的大作中提到】

: thrift也还支持。如果用cql要enable native protocol。但server 怎样存data是一样
: 的。我还倾向用CLI，明了。cql 有点不自然，明明是 column family, 非搞成table,
: 用起来要小心。schema 要想清楚了。
:
: columnName]

p*****2
发帖数: 21240

你有学cassandra吗？

【在 B********r 的大作中提到】

:
: table,
: 是啊,还有一个大问题,就是用CQL3怎么动态的加column, 比如我定义了一个table只有
: 两个colum,那就不能直接
: set keySpace[colFamily][colName] = value
: 而必须先add column 然后再 insert? 这样不是多了一个round trip么.
: 还有如果我要 select top 10 * from table order by id
: 貌似只能把所有的都返回,我在client side自己sort?

d*******r
发帖数: 3299

二爷能否简要说下，datastax 上面课的怎么快速过一遍？
http://www.datastax.com/
看着有不少付费的课程？

【在 p*****2 的大作中提到】

:
: 你有学cassandra吗？

p*****2
发帖数: 21240

学免费的那个，最多一个周末就可以了，可以拿certificate。
不用做作业。

【在 d*******r 的大作中提到】

: 二爷能否简要说下，datastax 上面课的怎么快速过一遍？
: http://www.datastax.com/
: 看着有不少付费的课程？

w**z
发帖数: 8232

你这个加column是指改table的schema？不是很明白你的意思。你不能用RDMS的column
来想Cassandra，Cassandra加column只是加value，不涉及到schema的change
至于order，如果你用 RandomPartitioner, it hashes the key and stores the
hashvalue. So you can't (natural) order by key . You can use Ordered
Partitioners, but it's not recommended and it has a lot of problem. If you
really need to sort, build your own index using other CF.

【在 B********r 的大作中提到】

B********r
发帖数: 397

column
之前可能没说明白,我说的加column意思是本来schema里面没有的column给加上. 比如
我想用time stamp 作为column name, 在CQL3 里面除了改变schema的话还有什么办法
么?

【在 w**z 的大作中提到】

: 你这个加column是指改table的schema？不是很明白你的意思。你不能用RDMS的column
: 来想Cassandra，Cassandra加column只是加value，不涉及到schema的change
: 至于order，如果你用 RandomPartitioner, it hashes the key and stores the
: hashvalue. So you can't (natural) order by key . You can use Ordered
: Partitioners, but it's not recommended and it has a lot of problem. If you
: really need to sort, build your own index using other CF.

B********r
发帖数: 397

没有,只是自己直接跑了几个vm瞎弄. 你是指那个课程么? 有啥推荐的么

【在 p*****2 的大作中提到】

:
: 学免费的那个，最多一个周末就可以了，可以拿certificate。
: 不用做作业。

p*****2
发帖数: 21240

有collection

【在 B********r 的大作中提到】

:
: 没有,只是自己直接跑了几个vm瞎弄. 你是指那个课程么? 有啥推荐的么

w**z
发帖数: 8232

Cassandra 的data 是sequential 存在disk上的，你改schema，影响比较大。

【在 B********r 的大作中提到】

:
: 没有,只是自己直接跑了几个vm瞎弄. 你是指那个课程么? 有啥推荐的么

(共1页)

进入Programming版参与讨论

相关主题
● Why You Should Never Use MongoDB	● 大牛讲讲Cassandra client
● Cassandra到底是不是schemaless？	● 现在最成熟稳定的websocket server是什么？
● 有谁能讲讲Cassandra secondary index的？	● consistent hashing实际应用
● cassandra query speed求助	● 请教一个 F的message设计问题，不能理解
● 有人set up过多个node的Cassandra 么？ (转载)	● Re: 请教一道题目
● Cassandra returns null row keys?	● 请问这道题怎么解决？
● wwzz来讲讲cassandra吧	● [合集] 一个链表倒转的问题
● node最好的关系型数据库是什么	● C++如何实现graph？

相关话题的讨论汇总
话题: partition话题: cassandra话题: node话题: column话题: table

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天