由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Java版 - HashMap cache
相关主题
请教一个多线程lock机制的问题今天下午要面一个老印
The utlimate solution to cocurrent cache请问这个面试题,关于synchronize hashmap
发现 synchronized 的一个问题Talk a little more about How to lock a file
synchronization for counters怎么synchronize时间捏
Tomcat Servlet: synchronized vs non-synchronized methodsApply lock on a class.
synchronization 锁住了什么?问一个基础问题
新手问个multi-threading关于synchronized和volatile的问题请教:performance issue
多线程真头疼,但也挺有趣请教:怎么把synchronize的method改成用thread 但thread safe呢?
相关话题的讨论汇总
话题: thread话题: key话题: hashmap话题: stack话题: string
进入Java版参与讨论
1 (共1页)
g*****g
发帖数: 34805
1
I am working on an extremely high load server and trying to
optimize performance.
We have a class for caching which is like
HashMap map;
synchronized getEnry(String key) {
Entry value = map.get(key);
if(value == null) {
//create entry
map.put(key, entry);
}
}
synchronized removeEntry(String key) {
map.remove(key)
}
There's no update operation.
Obviously this is like using Collections.synchronizedMap(map) everything has
to
run on single thread and the performance sucks.
F****n
发帖数: 3271
2
A simple change will improve your performance a lot:
Entry getEntry(String key) {
Entry value = map.getKey(key);
if (value == null) {
synchronized(map) {
value = map.getKey(key);
if (value == null) {
//create entry here;
}
}
}
return value;
}
void removeEntry(String key) {
if (map.containsKey(key) {
synchronized(map) {
map.remove(key);
}
}
}
In your code everybody gets a

【在 g*****g 的大作中提到】
: I am working on an extremely high load server and trying to
: optimize performance.
: We have a class for caching which is like
: HashMap map;
: synchronized getEnry(String key) {
: Entry value = map.get(key);
: if(value == null) {
: //create entry
: map.put(key, entry);
: }

g*****g
发帖数: 34805
3
You are using double check pattern, and while get is not
locking, add/update/remove does and isn't it better to use
ConcurrentHashMap where you can have multiple non-conflicting
update threads?

【在 F****n 的大作中提到】
: A simple change will improve your performance a lot:
: Entry getEntry(String key) {
: Entry value = map.getKey(key);
: if (value == null) {
: synchronized(map) {
: value = map.getKey(key);
: if (value == null) {
: //create entry here;
: }
: }

F****n
发帖数: 3271
4
First, you said you don't have update. If so, ConcurrentHashMap is not good
because the overhead of partitioning. Second, ConcurrentHashMap is not for
this purpose. It's for concurrent update. It does not check existing copy.
You still have to lock it otherwise you will have multiple concurrent
creations. Third, your double check pattern is not right.

【在 g*****g 的大作中提到】
: You are using double check pattern, and while get is not
: locking, add/update/remove does and isn't it better to use
: ConcurrentHashMap where you can have multiple non-conflicting
: update threads?

g*****g
发帖数: 34805
5
I understand the ConcurrentHashMap overhead part, however,
you do want concurrent creations on different keys. Your implementation
blocks that, doesn't it?

good

【在 F****n 的大作中提到】
: First, you said you don't have update. If so, ConcurrentHashMap is not good
: because the overhead of partitioning. Second, ConcurrentHashMap is not for
: this purpose. It's for concurrent update. It does not check existing copy.
: You still have to lock it otherwise you will have multiple concurrent
: creations. Third, your double check pattern is not right.

F****n
发帖数: 3271
6
Then double check on ConcurrentHashMap with key locks.

【在 g*****g 的大作中提到】
: I understand the ConcurrentHashMap overhead part, however,
: you do want concurrent creations on different keys. Your implementation
: blocks that, doesn't it?
:
: good

m******t
发帖数: 2416
7
Is Double Checked Lock no longer a broken pattern and
I somehow missed the news?

【在 g*****g 的大作中提到】
: You are using double check pattern, and while get is not
: locking, add/update/remove does and isn't it better to use
: ConcurrentHashMap where you can have multiple non-conflicting
: update threads?

g*****g
发帖数: 34805
8
It's not since 1.5 and you put a volatile keyword on it,
that's what I've heard.

【在 m******t 的大作中提到】
: Is Double Checked Lock no longer a broken pattern and
: I somehow missed the news?

F****n
发帖数: 3271
9
It has been fixed. It's essentially a JVM problem not the pattern's.

【在 m******t 的大作中提到】
: Is Double Checked Lock no longer a broken pattern and
: I somehow missed the news?

m******t
发帖数: 2416
10

Oh I didn't miss _that_ news, but the fix only fixes (via a hack btw)
it when only the variable itself is involved in race conditions.
The case in discussion is different. The synchronization on
map access is to prevent race conditions on the internal
map data structure, not on the variable holding the lookup
result.

【在 F****n 的大作中提到】
: It has been fixed. It's essentially a JVM problem not the pattern's.
相关主题
synchronization 锁住了什么?今天下午要面一个老印
新手问个multi-threading关于synchronized和volatile的问题请问这个面试题,关于synchronize hashmap
多线程真头疼,但也挺有趣Talk a little more about How to lock a file
进入Java版参与讨论
F****n
发帖数: 3271
11
I think in the discussed case double check will NEVER be a problem on ANY
JVM. The problem JVM used to have is at thread stack. The Map data is in the
heap.

【在 m******t 的大作中提到】
:
: Oh I didn't miss _that_ news, but the fix only fixes (via a hack btw)
: it when only the variable itself is involved in race conditions.
: The case in discussion is different. The synchronization on
: map access is to prevent race conditions on the internal
: map data structure, not on the variable holding the lookup
: result.

g*****g
发帖数: 34805
12
What will be the problem if I do double check with key locks
on HashMap instead?

【在 F****n 的大作中提到】
: Then double check on ConcurrentHashMap with key locks.
m******t
发帖数: 2416
13

the
I wouldn't be so sure. IIUC, the problem arises from a reference being
used before the object it points to is completely initialized. I don't see
how that has to do with where the data is.
I was also just reading the HashTable code from jdk 1.6. Looking
at get() and removeEntryForKey(), which span about 12 and 25 lines of code,
including a while loop, there is no way this doesn't require a strict
synchronization.

【在 F****n 的大作中提到】
: I think in the discussed case double check will NEVER be a problem on ANY
: JVM. The problem JVM used to have is at thread stack. The Map data is in the
: heap.

g*****g
发帖数: 34805
14
ConcurrentHashMap has internal lock to do it, HashMap seems
to need an external lock in my opinion.

【在 m******t 的大作中提到】
:
: the
: I wouldn't be so sure. IIUC, the problem arises from a reference being
: used before the object it points to is completely initialized. I don't see
: how that has to do with where the data is.
: I was also just reading the HashTable code from jdk 1.6. Looking
: at get() and removeEntryForKey(), which span about 12 and 25 lines of code,
: including a while loop, there is no way this doesn't require a strict
: synchronization.

m******t
发帖数: 2416
15

If it's for caching and you are not worried about updating, then
the objects cached can be considered immutable, then two objects
loaded by the same key would be logically equivalent and would stay
that way.
If that's the case, why not just skip synchronization all together.
Worse comes to worst two threads get two equivalent objects, right?

【在 g*****g 的大作中提到】
: I am working on an extremely high load server and trying to
: optimize performance.
: We have a class for caching which is like
: HashMap map;
: synchronized getEnry(String key) {
: Entry value = map.get(key);
: if(value == null) {
: //create entry
: map.put(key, entry);
: }

F****n
发帖数: 3271
16
The core of the double check pattern problem, is that when you assign a
newly created object to a field (which is on stack), the content of that
field on the stack may be out of synch. The volatile keyword fix it because
by declaring a field volatile, the content of that field will never be
cached on the stack.
If the variable is on the heap, there should be no problem.

【在 m******t 的大作中提到】
:
: If it's for caching and you are not worried about updating, then
: the objects cached can be considered immutable, then two objects
: loaded by the same key would be logically equivalent and would stay
: that way.
: If that's the case, why not just skip synchronization all together.
: Worse comes to worst two threads get two equivalent objects, right?

g*****g
发帖数: 34805
17
Can't do that. It's not a simple class, it has references to other
classes. And while I am new on this system, I am afraid that might
lead to double creation of some other informations.

【在 m******t 的大作中提到】
:
: If it's for caching and you are not worried about updating, then
: the objects cached can be considered immutable, then two objects
: loaded by the same key would be logically equivalent and would stay
: that way.
: If that's the case, why not just skip synchronization all together.
: Worse comes to worst two threads get two equivalent objects, right?

F****n
发帖数: 3271
18
That equals to ConcurrentHashMap without double check.

【在 m******t 的大作中提到】
:
: If it's for caching and you are not worried about updating, then
: the objects cached can be considered immutable, then two objects
: loaded by the same key would be logically equivalent and would stay
: that way.
: If that's the case, why not just skip synchronization all together.
: Worse comes to worst two threads get two equivalent objects, right?

F****n
发帖数: 3271
19
What about if two keys have same hash code?

【在 g*****g 的大作中提到】
: What will be the problem if I do double check with key locks
: on HashMap instead?

g*****g
发帖数: 34805
20
How is that related? When I lock on key.intern(),
don't you have to get the monitor on that key in the string pool
before you can access them? 2 different keys can have the same
hash code but how can they have the same monitor?
And two strings can't have the same hashcode right, otherwise
how can string be used as key?

【在 F****n 的大作中提到】
: What about if two keys have same hash code?
相关主题
怎么synchronize时间捏请教:performance issue
Apply lock on a class.请教:怎么把synchronize的method改成用thread 但thread safe呢?
问一个基础问题java的volatile
进入Java版参与讨论
F****n
发帖数: 3271
21
I was not accurate. Same hash codes may not happen but it is possible
that two different hash codes are treated same in a hashmap. If you
look at the implementation of HashMap, it just & the the lower bits of
the hash code.
BTW, off course two strings can have the same hash code. The Hashtable
will use hash code as internal index, and they will do a Object.equals
to check if the key is the same.
For object that qualifies as keys, the only thing that must be assured
is that if two objects pass eq

【在 g*****g 的大作中提到】
: How is that related? When I lock on key.intern(),
: don't you have to get the monitor on that key in the string pool
: before you can access them? 2 different keys can have the same
: hash code but how can they have the same monitor?
: And two strings can't have the same hashcode right, otherwise
: how can string be used as key?

m******t
发帖数: 2416
22

because
OK, one of us is losing it. 8-)
In the basic test case demonstrating how the DCL is broken:
http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
The variable 'helper' is a member variable, which has to
be on the heap.

【在 F****n 的大作中提到】
: The core of the double check pattern problem, is that when you assign a
: newly created object to a field (which is on stack), the content of that
: field on the stack may be out of synch. The volatile keyword fix it because
: by declaring a field volatile, the content of that field will never be
: cached on the stack.
: If the variable is on the heap, there should be no problem.

g*****g
发帖数: 34805
23
OK, I get what you are saying, but you still doesn't explain
how could a monitor on key.intern() fail for sync purpose.

【在 F****n 的大作中提到】
: I was not accurate. Same hash codes may not happen but it is possible
: that two different hash codes are treated same in a hashmap. If you
: look at the implementation of HashMap, it just & the the lower bits of
: the hash code.
: BTW, off course two strings can have the same hash code. The Hashtable
: will use hash code as internal index, and they will do a Object.equals
: to check if the key is the same.
: For object that qualifies as keys, the only thing that must be assured
: is that if two objects pass eq

F****n
发帖数: 3271
24
I think a member variable is on the thread's stack, not heap. The JVM's
memory model is like this: The common heap contains block memory and non-
field objects, and a stack per thread that contains member variables (fields
). The problem is that when you initialize a field in an unsynchronized
block, other threads may "see it" before the field is completely initialized
. You don't have that problem in the heap.

【在 m******t 的大作中提到】
:
: because
: OK, one of us is losing it. 8-)
: In the basic test case demonstrating how the DCL is broken:
: http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
: The variable 'helper' is a member variable, which has to
: be on the heap.

g*****g
发帖数: 34805
25
It's not a heap or stack thing. It's that the object construction
may not have finished in one thread and it's seen existing already
for another thread.

【在 m******t 的大作中提到】
:
: because
: OK, one of us is losing it. 8-)
: In the basic test case demonstrating how the DCL is broken:
: http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
: The variable 'helper' is a member variable, which has to
: be on the heap.

m******t
发帖数: 2416
26

Yeah, I think there is an implementation of Map with an external lock.
And it's called HashTable. ;-)

【在 g*****g 的大作中提到】
: ConcurrentHashMap has internal lock to do it, HashMap seems
: to need an external lock in my opinion.

F****n
发帖数: 3271
27
If you have two keys with same hash codes or same internal codes they are
basically unsynchronized, right?

【在 g*****g 的大作中提到】
: OK, I get what you are saying, but you still doesn't explain
: how could a monitor on key.intern() fail for sync purpose.

F****n
发帖数: 3271
28
When you say "initialize" in one thread you are referring to the private
stack that thread holds.

【在 g*****g 的大作中提到】
: It's not a heap or stack thing. It's that the object construction
: may not have finished in one thread and it's seen existing already
: for another thread.

g*****g
发帖数: 34805
29
Not if the performance matters. Just saying. HashTable is thread-safe
but nothing is really CONCURRENT.

【在 m******t 的大作中提到】
:
: Yeah, I think there is an implementation of Map with an external lock.
: And it's called HashTable. ;-)

m******t
发帖数: 2416
30

fields
Could you give me a link on details about this so I can make sure I
really understand what you are saying, because as of now I'm taking it
as you are saying that member variables are the same as thread locals?
initialized

【在 F****n 的大作中提到】
: I think a member variable is on the thread's stack, not heap. The JVM's
: memory model is like this: The common heap contains block memory and non-
: field objects, and a stack per thread that contains member variables (fields
: ). The problem is that when you initialize a field in an unsynchronized
: block, other threads may "see it" before the field is completely initialized
: . You don't have that problem in the heap.

相关主题
Java concurrency的疑惑,难道我理解错了? (转载)The utlimate solution to cocurrent cache
Collection return type and web service发现 synchronized 的一个问题
请教一个多线程lock机制的问题synchronization for counters
进入Java版参与讨论
g*****g
发帖数: 34805
31
What does object monitor synchronization uses have to do with hashcode?
The problem with key.intern() I would think is causing racing in different
places of system if multiple places are using it, but it's certainly
thread-safe.

【在 F****n 的大作中提到】
: If you have two keys with same hash codes or same internal codes they are
: basically unsynchronized, right?

m******t
发帖数: 2416
32

That's how I understand it too, but Foxman is saying that it wouldn't
be a problem in this case because of stack vs. heap. That's how this
has become "a head or stack thing" now. 8-)

【在 g*****g 的大作中提到】
: It's not a heap or stack thing. It's that the object construction
: may not have finished in one thread and it's seen existing already
: for another thread.

F****n
发帖数: 3271
33
No, I said if a field is not declared volatile it will be cached in local.
Basically thread locals are caches of the common heap.

【在 m******t 的大作中提到】
:
: That's how I understand it too, but Foxman is saying that it wouldn't
: be a problem in this case because of stack vs. heap. That's how this
: has become "a head or stack thing" now. 8-)

m******t
发帖数: 2416
34

Well, I'm going to take a big step back and ask you the obvious here:
have you done profiling and made sure that the old school, straightforward,
thread-safe synchronized block is the root cause of your performance
problem?
If you are new to the project, probably a good idea to stay with a safe
but maybe less performant solution.

【在 g*****g 的大作中提到】
: Not if the performance matters. Just saying. HashTable is thread-safe
: but nothing is really CONCURRENT.

g*****g
发帖数: 34805
35
Yes, we take a snapshot on JVM every 30 minutes. And we see
threads are getting blocked
Id:595
Name:J-379
State:BLOCKED on java.lang.Class@478d34 owned by J-382
Stack trace:
......
Of course this may not be the biggest bottleneck for performance per se.
But it's obviously I need to do something about it. If not for real
performance gain, getting it out of picture will prove I am actually
doing something.

【在 m******t 的大作中提到】
:
: Well, I'm going to take a big step back and ask you the obvious here:
: have you done profiling and made sure that the old school, straightforward,
: thread-safe synchronized block is the root cause of your performance
: problem?
: If you are new to the project, probably a good idea to stay with a safe
: but maybe less performant solution.

m******t
发帖数: 2416
36

I think we have some confusion in terminology. When you say
"cached in thread local", you mean the cache of the _processor
core_ that happens to be executing the thread, which, allow
me for insisting with my terminology, is yet another heap,
only smaller.
I took you to mean "thread local" as in:
http://java.sun.com/javase/6/docs/api/java/lang/ThreadLocal.html
Well, back to our original point, I do still believe that the same
situation applies to HashTable, as the internal data structure of
Hash

【在 F****n 的大作中提到】
: No, I said if a field is not declared volatile it will be cached in local.
: Basically thread locals are caches of the common heap.

m******t
发帖数: 2416
37
But what about throughput and average response time _per thread_?
We constantly see a couple of synchronization spots in our
thread dumps as well, but every time it's a different thread holding it.
And because it's on the typical code path, with around 50 user threads
in the vm all the time, there is almost always some thread holding it.
But then from each individual thread's stand point, they almost never
spend a lot of time waiting on the lock, so I don't really consider it
a truly serialized

【在 g*****g 的大作中提到】
: Yes, we take a snapshot on JVM every 30 minutes. And we see
: threads are getting blocked
: Id:595
: Name:J-379
: State:BLOCKED on java.lang.Class@478d34 owned by J-382
: Stack trace:
: ......
: Of course this may not be the biggest bottleneck for performance per se.
: But it's obviously I need to do something about it. If not for real
: performance gain, getting it out of picture will prove I am actually

F****n
发帖数: 3271
38
Sorry for the terminology. It should the thread's local stack. ThreadLocal
is actually a class to access that stack.

【在 m******t 的大作中提到】
: But what about throughput and average response time _per thread_?
: We constantly see a couple of synchronization spots in our
: thread dumps as well, but every time it's a different thread holding it.
: And because it's on the typical code path, with around 50 user threads
: in the vm all the time, there is almost always some thread holding it.
: But then from each individual thread's stand point, they almost never
: spend a lot of time waiting on the lock, so I don't really consider it
: a truly serialized

F****n
发帖数: 3271
39
No it's not. Imagine two key with same hashcodes (or internal address codes
). They are not synchronized. So when you remove one from the bucket the
other one may be appended after the removed one.

【在 g*****g 的大作中提到】
: What does object monitor synchronization uses have to do with hashcode?
: The problem with key.intern() I would think is causing racing in different
: places of system if multiple places are using it, but it's certainly
: thread-safe.

g*****g
发帖数: 34805
40
I believe for this piece of code we usually configure 200
worker threads thread pool, so while blocking is not always
a performance issue for the overall system, we do want to
minimize the synchronized scope and sync on a function is
almost never the right thing to do.

【在 m******t 的大作中提到】
: But what about throughput and average response time _per thread_?
: We constantly see a couple of synchronization spots in our
: thread dumps as well, but every time it's a different thread holding it.
: And because it's on the typical code path, with around 50 user threads
: in the vm all the time, there is almost always some thread holding it.
: But then from each individual thread's stand point, they almost never
: spend a lot of time waiting on the lock, so I don't really consider it
: a truly serialized

相关主题
synchronization for counters新手问个multi-threading关于synchronized和volatile的问题
Tomcat Servlet: synchronized vs non-synchronized methods多线程真头疼,但也挺有趣
synchronization 锁住了什么?今天下午要面一个老印
进入Java版参与讨论
m******t
发帖数: 2416
41

I wouldn't make a technically subtle fix unless I'm sure
what it fixes is going to be worth the risk.

【在 g*****g 的大作中提到】
: I believe for this piece of code we usually configure 200
: worker threads thread pool, so while blocking is not always
: a performance issue for the overall system, we do want to
: minimize the synchronized scope and sync on a function is
: almost never the right thing to do.

m******t
发帖数: 2416
42

codes
Right, not to mention also the bucket itself could be changed
and synch'ing on the hash value itself mostly certainly won't
cover that.

【在 F****n 的大作中提到】
: No it's not. Imagine two key with same hashcodes (or internal address codes
: ). They are not synchronized. So when you remove one from the bucket the
: other one may be appended after the removed one.

g*****g
发帖数: 34805
43
OK, I get that, thanks. I wasn't thinking inside of HashMap
implementation.

【在 m******t 的大作中提到】
:
: codes
: Right, not to mention also the bucket itself could be changed
: and synch'ing on the hash value itself mostly certainly won't
: cover that.

k***r
发帖数: 4260
44
There are other problems. The conclusion so far is it's
still broken, and pretty much unfixable.

【在 g*****g 的大作中提到】
: It's not since 1.5 and you put a volatile keyword on it,
: that's what I've heard.

F****n
发帖数: 3271
45
I don't understand why they say double check is algorithmically incorrect.

【在 k***r 的大作中提到】
: There are other problems. The conclusion so far is it's
: still broken, and pretty much unfixable.

1 (共1页)
进入Java版参与讨论
相关主题
请教:怎么把synchronize的method改成用thread 但thread safe呢?Tomcat Servlet: synchronized vs non-synchronized methods
java的volatilesynchronization 锁住了什么?
Java concurrency的疑惑,难道我理解错了? (转载)新手问个multi-threading关于synchronized和volatile的问题
Collection return type and web service多线程真头疼,但也挺有趣
请教一个多线程lock机制的问题今天下午要面一个老印
The utlimate solution to cocurrent cache请问这个面试题,关于synchronize hashmap
发现 synchronized 的一个问题Talk a little more about How to lock a file
synchronization for counters怎么synchronize时间捏
相关话题的讨论汇总
话题: thread话题: key话题: hashmap话题: stack话题: string