第2页 - 关于hash的讨论汇总 - 话题女王

y********o
发帖数: 2565

来自主题: Database版 - SQL Server 2005: How to hash a column?

Guess what, I googled out one site:
http://md5.benramsey.com/
It did successfully reverse the hash of 'abc123'. But not any of my real pa
sswords, :-)
It looks like the hash reversal engine above has a small dictionary of hashe
d entries. If you just use your name initials plus your birth date as your
password, it won't be able to reverse it.

that
"
for

i********g
发帖数: 41

来自主题: Database版 - 请问怎么用sql实现hash

假设我有一个table, 有attribute A, B，每个A的值对应好几个B的值。现在我想看看
哪些A 有相同的B得值，最简单的办法是不是应该把每个A对应的所有B的值都hash 到一
个value，然后比较hash的值？可是SQL怎么实现这样的hash呢？我的table TB 很大，
有 >100K tuples，所以我不想用jdbc来做，效率太低了。能不能直接在sql里面实现？
多谢指教！

yy
发帖数: 45

来自主题: Programming版 - perl array|hash question

suppose i have a array @a or hash @hash, which are already assigned some
values
I am wondering whether there is a way to reset this array or hash to be empty?
Thanks!

l******e
发帖数: 94

来自主题: Programming版 - Interview questions about hash function

Phone screen question:
You have a billion urls , where each has a huge page, how to detect the
duplicate documents?
I said hashing the document contents, so the interviewer asker do I know
which hash function should I used? I have no clue about what specific
function can hash a large file into a small key that takes relatively less
space.
Anybody give can give me some hint?

b********e
发帖数: 215

来自主题: Programming版 - c++ stl里面有hash table吗？

发现java里有hash table,但是c++ stl好像没有hash table，如果面试的时候写C++程
序要用到hash table怎么办？谢谢。

z****e
发帖数: 2024

来自主题: Programming版 - interview question: (RB tree vs. hash table)

why is STL map implemented by RB-tree, rather than hash table?
can some one explain the cons and pros of RB-tree and hash table, when to
pick which to use?
RB-tree VS. hash table.
Thanks.

r****o
发帖数: 1950

来自主题: Programming版 - 请问关于hash table的大小设定问题。 (转载)

【以下文字转载自 JobHunting 讨论区】
发信人: roufoo (五经勤向窗前读), 信区: JobHunting
标题: 请问关于hash table的大小设定问题。
发信站: BBS 未名空间站 (Mon Jun 7 18:42:24 2010, 美东)
【以下文字转载自 InterviewHackers 俱乐部】
发信人: roufoo (五经勤向窗前读), 信区: InterviewHackers
标题: 请问关于hash table的大小设定问题。
发信站: BBS 未名空间站 (Mon Jun 7 18:42:14 2010, 美东)
请问，假如有n个key的话，hashtable的size一般设为多大比较合适?
这个size的大小跟hash function的选择有关系吗?

C***y
发帖数: 2546

来自主题: Programming版 - Hash virtual memory address

64位机器
我想把heap 上分配的内存块hash到一个大小为512的hash table中. 内存块可能很小，
也可能很大。有什么简单快速的hash function吗？
想用Knuth's Multiplicative Method，乘以一个很大的素数，会不会太慢了？
谢谢！

m******u
发帖数: 12400

来自主题: Programming版 - 怎么做hash运算？

用例子来说明我的问题：
比如生意店铺里顾客购物、饭店里点菜等，我们用hash来记录每个单子（一次消费）。
月底或某段时间之后，汇总。这些hash内，商品名是key，购买数量是value。现在要汇
总，同一样商品的数量累计。问题是怎么用hash来实现这个汇总。谢谢。

m******u
发帖数: 12400

来自主题: Programming版 - 怎么做hash运算？

我的问题是hash运算，这只是举个例子。再说用如果用hash来记录买卖记录，不用商品
本身做key用什么？你会说用条形码，那本来就是一回事。
hashi1={"pants"=> 1, "toothpaste"=>2, "gum" => 4 }
hash2 ={"shirt" =>2, "suit" => 1, "gum => 5", "toothpaste"=> 1}
怎么通过hash运算，说，得到pants：1；toothpaste：3； gum： 9； shirt：1；
suit：1

a*****1
发帖数: 314

来自主题: Programming版 - redis 存储 object的问题：每个object一个hash，这样能行吗？

一个 user表的几个field，每次操作都要判断。
打算放到一个class里面，然后，一个object 一个 hash。这样读写更快。不用反序列
化了。
如果几百million object就要创建几百million hash。可不可以这样做呢？
hash 数量，没限制吧？？
谢谢了

o******6
发帖数: 538

来自主题: Statistics版 - [合集] hash table and large dataset

☆─────────────────────────────────────☆
largetail (largetail) 于 (Mon Mar 16 22:57:50 2009) 提到:
Any one familiar with hash table or handle large dataset (hundreds of
millions of obs) in SAS (sort merge and subset). Any reference and suggest
is greatly appreciated.
☆─────────────────────────────────────☆
qqzj (小车车) 于 (Mon Mar 16 23:46:54 2009) 提到:
hash table is used when you have a huge data base and a small code table.
you can transform the smaller one into a hash table. it is very effi

S******3
发帖数: 66

来自主题: Statistics版 - 怎样用hash table做full join？

SAS hash can do inner join/left join easily.
just for learning purpose, full join is simply a left join + all records in
2nd table that can't be found in the 1st table. By this way, you need load
both table into hash (one full table + one table with key only); and you
need scan both tables once completely, so may not worth much of Hash.

KG
发帖数: 515

来自主题: CellularPlan版 - FreedomPop VoIP password is hashed now

Hashing本来就是one way function，理论上无法还原，除非你brute-force，或者简单
的password，用rainbow table。你为什么要还原？Authentication一般都用hash，不
用plain text。
[在 Hydra (Hydra) 的大作中提到：]
:想把FreedomPop的VoIP username/password弄出来装到其他SIP client上，发现
:password是hashed了，以前iPhone上的的.linphonerc 里的property name是password
,现在是ha1. 放到SIP client上就报错，401-unauthorized
:哪位大侠有新的功略把password 还原出来吗？
:谢谢！
:☆ 发自 iPhone 买买提 1.23.08

S**Y
发帖数: 136

来自主题: JobHunting版 - STL doesn't have hash Table?

I installed ubuntu, and found hash table is not supported.
Can anyone suggest a lib. for hash table?
Thanks a lot.

s*********t
发帖数: 1663

来自主题: JobHunting版 - 将一对整数hash成一个整数，有什么好的hash function吗？

太难了，帮顶
我感觉先组合成字符串，中间放个'.'，这样就唯一了。
然后hash成整数

a*****k
发帖数: 704

来自主题: JobHunting版 - implement hash table

in general, if I want to scale up the hash table, I have to recalculate hash
values of previous elemetns, and put them in the right places, right?

to

t**n
发帖数: 272

来自主题: JobHunting版 - 常见的string hash function

1M个字符用hash并不好，碰撞的可能比较高，如何选hash函数貌似有个公式的，wiki上有
1M个file, 用MD5啊，碰撞几率很低

r****o
发帖数: 1950

来自主题: JobHunting版 - 请问关于hash table的大小设定问题。

【以下文字转载自 InterviewHackers 俱乐部】
发信人: roufoo (五经勤向窗前读), 信区: InterviewHackers
标题: 请问关于hash table的大小设定问题。
发信站: BBS 未名空间站 (Mon Jun 7 18:42:14 2010, 美东)
请问，假如有n个key的话，hashtable的size一般设为多大比较合适?
这个size的大小跟hash function的选择有关系吗?

h*********3
发帖数: 111

来自主题: JobHunting版 - universial hashing 一问

我看了看书，说是每来个key,随即的选一个hash function 来计算hash的位置，是这
样吗？
如果是这样，那么插入时用了一个function f1, 读出时用了另外一个funciton f2,
怎么可能找到需要找的item呢？

b******n
发帖数: 4509

来自主题: JobHunting版 - universial hashing 一问

universal hashing guarantees the collision is smaller than 1/m,
but you need a mechanism to make sure the same key maps to the same slot

,
key
hash
是这
f2,

r**d
发帖数: 316

来自主题: JobHunting版 - how to query in the universal hash table?

在java里hash函数不是随机选择的，而是由class定死的，所以给定key，所映射的
index是一定的。
如果hash函数发生碰撞就需要遍历table[index]链表里的所有元素了。

s********i
发帖数: 17328

来自主题: JobHunting版 - how to query in the universal hash table?

这个题出得不make sense啊，之所以用universal hashing就是遍历hash function的时
间是固定的，从而减少可能search collision的时间。

s********i
发帖数: 17328

来自主题: JobHunting版 - how to query in the universal hash table?

楼上几位，我错了，没学好。楼主的问题误导了我，也就是说universal hashing不存
在iterate hash function的问题。

s******d
发帖数: 61

来自主题: JobHunting版 - How to serialize hash table

如果serialize Binary Tree 可以直接加标记....
serialize hash table的话, hash table 本身是数组,或者seperate chaining数组带
链表
数组的话本身不就可以是serialized的吗?

f*******t
发帖数: 7549

来自主题: JobHunting版 - 问个面试时候hash table的C++实现问题

hash value就是key啊
一般hashmap，K是V算出来的hash值

a********n
发帖数: 1287

来自主题: JobHunting版 - 一个小问题，hash map和map的区别是什么？

你的意思是map的实现是binary search tree，hash map的实现是hash。
？

h****n
发帖数: 1093

来自主题: JobHunting版 - g家面题：hash表能否实现比O(n)好的随机拾取？

suppose the hash is implemented by chaining
Hash size is m, key number is n, the longest length of chain is L
Then we can do:
1. randomly select a bucket, suppose the length of chain is k (1<=k<=L)
2. for the list in bucket k, randomly generate a number l between 1 to L
if l is less than k, then we return the lth item in the list
It is just like a rejection sampling in a 2D array
Time complexity: The expected number per bucket is n/m
The probability that rejection sampling succeeds is n/m/L
The ... 阅读全帖

S******t
发帖数: 151

来自主题: JobHunting版 - 不改变排序的hash算法?

我认为理论上不可能有这样的Hashing Function，如果有的话，那么对这个h(h(x))再
取hash，Iterate若干次之后，如果是为了保序，一定会收敛到一个不动点。
也就是说类似于
if x < y then h(x) < h(y)
那么最终h*(x) < h* (y) for all x and y
这个只有可能h(x) = x自身吧？

w****a
发帖数: 710

来自主题: JobHunting版 - 不改变排序的hash算法?

自己实现计算hashCode,比如原来32位的hash你扩张到64位
后32位还是hash 前32位用于排序

l*******b
发帖数: 2586

来自主题: JobHunting版 - hash table的size为什么最好是个质数？

不互质的话,大部分权重都在最后那一项c[n]上了. hash function大概就是要对这种情
况不敏感, 最好是只要输入数据有一定的随机性就得到比较平均的hash value, 而不是
依赖于一般性的假设

B*****7
发帖数: 137

来自主题: JobHunting版 - universal hashing的问题

如果以数据库为例，也就是说，系统如果受恶意request攻击就换个hashing function
，然后以新的hashing function重新生成index？

w*******r
发帖数: 277

来自主题: JobHunting版 - Intro to algorithm里关于hash table的复杂度

第二版的11.2里对于chaining的search hit复杂度的证明是不是太奇怪了
就是定理11.3的证明，觉得很别扭阿
为什么要假设不是事先给定了一个固定的hash table
而是假设成通过每次search miss之后insert而建立起来的？这样还要把建表的过程考
虑到概率里面去
上面在证明search miss的时候的复杂度时，就没有这么假设，而是假设事先给定的一
个hash table
两种情况采用的标准不一致阿

h*******e
发帖数: 1377

来自主题: JobHunting版 - tinyurl 设计的时候回答需要注意什么，除了hash还有什么。

hash冲突怎么办呢，一个key hash值对应两个column name(url)了就
这里的索引UUID似乎没有用到的阿.

s***5
发帖数: 2136

来自主题: JobHunting版 - 讨论一下给定平面上n点，求在同一直线上最多点问题～～大家用double　/float 类型作hash key么

use hash of hash: unordered_map >

h*******e
发帖数: 1377

来自主题: JobHunting版 - 讨论一下给定平面上n点，求在同一直线上最多点问题～～大家用double　/float 类型作hash key么

你多做些计算几何题就知道了，计算几何差一点就差很多，比如三点共线的标准方法
就是叉乘面积近似为0，近似斜率做就引进边的长度误差，计算几何在严格一点的oj上都
是差之毫厘，失之千里的，过了leetcode的 oj的做法很多不一定就一定无懈可击，往
往面试难得不是原题，而是是要求改一点，比如这道题之前c++ 不支持 unordered_set
的时候就看过这道题，那要高效算法就只能手动写链表hash table实现hash~~ 这道题
glowinglake 的 follow up 也不是那么简单说海量数据情况下应该怎么变我还没细想
。
by the way 关于sqrt 那道题虽然可以用 epsilon但是有不用 epsilon的精确解法，
不知道你能不能想出来。

s***5
发帖数: 2136

来自主题: JobHunting版 - 讨论一下给定平面上n点，求在同一直线上最多点问题～～大家用double　/float 类型作hash key么

use hash of hash: unordered_map >

h*******e
发帖数: 1377

来自主题: JobHunting版 - 讨论一下给定平面上n点，求在同一直线上最多点问题～～大家用double　/float 类型作hash key么

m**********s
发帖数: 518

来自主题: JobHunting版 - 请教大侠们hash table 多线程问题

你还是不要再写code了
什么时候hash table是个single object了？
Key points to a list of nodes for the sake of possible hash collision
你能atomic的copy一个list？

发帖数: 1

来自主题: JobHunting版 - 请教大侠们hash table 多线程问题

我一直强调的是绝对不可以在hashtable的设计(尤其在面试的时候)用copy-on-write，
一定要用lock。
第一点：最基本的常识，只能有一个thread来写，比如hashtable的值用作counter。用
copy-on-write，立马废掉。
第二点: 更进一步的synchronization，是要保证global ordering。比如狗家股票real
time streaming, 绝对不可以11:11:11.002的报价先写入，然后11:11:11.000的报价
后写入。这就有不少的实现方法。这个已经超出了hashtable面试的范围，但是如果你
想exceed expectation，mention this.
很多刚毕业的菜鸟喜欢在面试的时候胡吹一些"advanced technology"以达到shock-and
-owe的效果，use it wisely。比如下面这个就非常的un-wise：
“你知道你一个bucket有100个element，多线程同时写在mutex上要等多久么？不过你
提的这个throughput可能用mutex也能达到”
听着都能让... 阅读全帖

j***u
发帖数: 2

来自主题: CS版 - 各位大牛, 有无可逆的Hash函数?

Hash函数通常是不可逆的. 那么, 存不存在可逆的Hash函数呢? 诚请各位大牛拍砖指教.

j********e
发帖数: 124

来自主题: CS版 - 各位大牛, 有无可逆的Hash函数?

Hash functions are heavily used in Criptography. Though most hash functions in
that purpose are not invertable, I think some are.

.

y****r
发帖数: 17

来自主题: Programming版 - 怎么把hash of hash 存到数据库里？

也就是第一个的value是另一个hash table reference.
对数据库的要求还必须是存到一个db table里

z****e
发帖数: 2024

来自主题: Programming版 - interview question: (RB tree vs. hash table)

你这点说得非常对，map里边有iterator，是可以按顺序遍历的，hash 无法排序，无法
选择（比如第ith smallest）。
我还加一点，就是dynamic hash table. 因为事先不知道元素多少，无法确定table大
小，只能用table被填满一半，就把table大小翻倍，（linear probing 情况下）。这
个操作比较昂贵。用separate chaining 的话，元素越多，就越来效率越低。说是O(1)
的操作，最后都不知道O几了。
所以，不知道大小，就RB-tree，实现动态调整大小，而且，插，删，找，永远log(N)。
不知道这样说对不对，还有没有其他补充。
多谢大家。

j***3
发帖数: 142

来自主题: Programming版 - [Perl] how to create a new hash that is a subset of a exis

how to create a new hash that is a subset of an existing hash?
under some condition e.g. $key >=a and $key <=b.
thanks

C***y
发帖数: 2546

来自主题: Programming版 - Hash virtual memory address

是hash首地址，首地址的pattern跟块大小还是有关的
不管了，现在是直接取后32位乘以一个素数，然后再mod hash table的size
谢谢！

s****A
发帖数: 80

来自主题: Programming版 - hash table的size为什么最好是个质数？ (转载)

【以下文字转载自 JobHunting 讨论区】
发信人: studyA (Algorithm), 信区: JobHunting
标题: hash table的size为什么最好是个质数？
发信站: BBS 未名空间站 (Mon Feb 18 17:32:47 2013, 美东)
而且书上还说，如果来search字符串的话,最好这样计算hash function
( char[0] x s^n + char[1] x s^(n-1)+...+char[n] )% M
这里如果s和M不互质的话，就会导致得到的index分布不uniform
我没想明白为什么，我怎么觉得不管是否互质，只要输入分布uniform得到的index还是
在0到M-1之间uniform分布的啊
谁能给举个例子说明一下s和M不互质的情况下会出什么问题？
最好是例子，一看就明白
谢谢！

y****e
发帖数: 23939

来自主题: Programming版 - 问一个hash code的问题

有这样一个三个字节的字符串，NT0， NbR, etc.，其中第一个是一个字母，第二个是
【B，b, A, a, T】中的一个，第三个是一个字母或数字。怎样设计一个hash function
能够把它用尽可能少的hash code表示出来。其实也是一个数据压缩的问题。

g*****g
发帖数: 34805

来自主题: Programming版 - consistent hashing实际应用

consistent hash 需要重新hash的部分很少，就是1/N。

data

z*********e
发帖数: 10149

来自主题: Programming版 - 怎么找到hash和原文的关系？

我有一个sha256的hash
718ab8de60e431ec64144dd6637596be49688d5e9fcdd6673e204f9a03297ff6
我大概知道跟这个长度11的原文dake5k6uhwp有点关系，估计是加了点什么盐。如果没
办法知道加盐配方，有什么暴力app能通过这个提示帮忙找到这个hash的原文吗？

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天