问个google面试题 - JobHunting版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

JobHunting版 - 问个google面试题

相关主题
● 还有一周onsite，怎么看Hadoop.The.Definitive.Guide效率最高？	● 问一道面试题
● 前员工追忆百度乱象：上下异心狼性压制人性（转载）	● Amazon面试题的疑惑，5个包子答谢！
● Google 的所谓核心技术的部门是哪些	● 一道Google面试题
● 准备跟800题大牛一起啃system design的硬骨头了	● 请教一道面试题
● 不会C++，后果多严重？	● 问两道amazon的面试题
● 据说CS面试有三篇必看paper……	● FB system design client 给 server 传输文件的系统。一个/多个clients <-> 一个／多个 server
● 驳G家的技术不如FLA先进	● 一道面试题求助
● 请教2个 huge file的面试题	● 问道MS的面试题

相关话题的讨论汇总
话题: db话题: friends话题: going话题: friend话题: given

进入JobHunting版参与讨论

(共1页)

B*******1
发帖数: 2454

图论比较弱，觉得应该是用图，请大牛们指点：
Suppose there are 2 persons A and B on FB . A should be able to view the
pictures of B only if either A is friend of B or A and B have at least one
common friend . The interviewer discussed it for nearly 30 minutes . The
discussion mainly included following points -
1. How are you going to store the list of friends for a given user ?
2. File system vs DB
3. Given list of friends of 2 users , how are you going to find common
friends ?
4. If you are going to store the friends in DB then how will the table
look like ?
5. How many servers do you need ?
6. How are you going to allocate work to servers ?
7. How many copies of data will you need ?
8. What problems will you face if you are maintaining multiple copies of
data.

P**********c
发帖数: 3417

G居然也开始考数据库了，以前很少看到数据库相关的问题。这个题目很难啊，如果我
遇到估计也会答的乱七八糟的。

【在 B*******1 的大作中提到】

: 图论比较弱，觉得应该是用图，请大牛们指点：
: Suppose there are 2 persons A and B on FB . A should be able to view the
: pictures of B only if either A is friend of B or A and B have at least one
: common friend . The interviewer discussed it for nearly 30 minutes . The
: discussion mainly included following points -
: 1. How are you going to store the list of friends for a given user ?
: 2. File system vs DB
: 3. Given list of friends of 2 users , how are you going to find common
: friends ?
: 4. If you are going to store the friends in DB then how will the table

B*******1
发帖数: 2454

是阿，我数据库也忘光了。

【在 P**********c 的大作中提到】

: G居然也开始考数据库了，以前很少看到数据库相关的问题。这个题目很难啊，如果我
: 遇到估计也会答的乱七八糟的。

r******n
发帖数: 170

是楼主自己碰到的题吗？
觉得这题挺难，不过也觉得挺好的，覆盖面挺广，希望大牛出来说说

【在 B*******1 的大作中提到】

q****x
发帖数: 7404

these questions have nothing to do with graph ah.
but good questions.

the
one
The
?
common
table

【在 B*******1 的大作中提到】

r******n
发帖数: 170

我随便说说，看是否有大牛出来吧
1. format存成file或者db table,存的内容基本就是user id(unique)的friend关系，
比如file每行 1 2就表示user1和user2是friend
2.file存在io问题，interface上得自己重新建立datastructure来提供常见的function
,同时这也是优势，更灵活，有完全控制权；可以像读triangle mesh一样读进来，然后
建立mesh；db通常有先有工具，只需考虑table schema的设计，有index等手段优化
search等功能；db易于数据同步，备份，建立db cluster, load balance等常见手段
3. 两个数组按照user id升序排列，找相同id,O(m+n)遍历，可以找完
4. tableName:Connection, 两个column friendStart, friendEnd,都用来存放userId,
两个column都是index
select distinct friend from (
select friendEnd from Connection tbl,
(select friendEnd from Connection where friendStart=A) AS directFriend
where tbl.friendStart=directFriend.friendEnd
Union ALL
select friendEnd from Connection where friendStart=A
)
差不多就能把user A的直接好友和1层简洁好友都选出来
5，6：根据userid建立hash (hash key可以考虑user location等),分配到对应的server index上，同时也就是database
server的id,具体多少台server不知道怎么答？而且如何根据user的相似度做cluster,分散数据到各个表也不清楚
7,8: 因为数据量太大，db table肯定得分割，如何数据冗余，做分割就不知道了。必
须得有database denormalization的过程，假如跨server，问题有如何同步数据(一个
transaction进去？)，load balance具体的cluster操作？
觉得有大型sns数据库实际操作的能出来讲讲，给个啥link的也好

【在 B*******1 的大作中提到】

P**********c
发帖数: 3417

楼主面的什么组啊。这种题目如果不是面google+应该不会这么domain specific吧。

function
userId,

【在 r******n 的大作中提到】

: 我随便说说，看是否有大牛出来吧
: 1. format存成file或者db table,存的内容基本就是user id(unique)的friend关系，
: 比如file每行 1 2就表示user1和user2是friend
: 2.file存在io问题，interface上得自己重新建立datastructure来提供常见的function
: ,同时这也是优势，更灵活，有完全控制权；可以像读triangle mesh一样读进来，然后
: 建立mesh；db通常有先有工具，只需考虑table schema的设计，有index等手段优化
: search等功能；db易于数据同步，备份，建立db cluster, load balance等常见手段
: 3. 两个数组按照user id升序排列，找相同id,O(m+n)遍历，可以找完
: 4. tableName:Connection, 两个column friendStart, friendEnd,都用来存放userId,
: 两个column都是index

s*****n
发帖数: 5488

这个open问题本来就应该用nonsql来解。做到sql就是歪了。人家也就是顺着问下去了。

function
userId,

【在 r******n 的大作中提到】

l***i
发帖数: 1309

Why so many FB questions carry the title of google?

s*******f
发帖数: 1114

以菜鸟的经验，读完G三篇（GFS，BIGtable，mapreduce)之后，你的回答能从40分提高
到70分。

function
userId,

【在 r******n 的大作中提到】

相关主题
● 据说CS面试有三篇必看paper……	● 问一道面试题
● 驳G家的技术不如FLA先进	● Amazon面试题的疑惑，5个包子答谢！
● 请教2个 huge file的面试题	● 一道Google面试题
进入JobHunting版参与讨论

w****o
发帖数: 2260

什么是 GFS?

【在 s*******f 的大作中提到】

: 以菜鸟的经验，读完G三篇（GFS，BIGtable，mapreduce)之后，你的回答能从40分提高
: 到70分。
:
: function
: userId,

y**********u
发帖数: 6366

google file system
因为gfs只有single master node，所以google早就打算重写了

提高

【在 w****o 的大作中提到】

: 什么是 GFS?

s*******f
发帖数: 1114

重写后是啥样子？有没有paper？

【在 y**********u 的大作中提到】

: google file system
: 因为gfs只有single master node，所以google早就打算重写了
:
: 提高

g*********e
发帖数: 14401

应该是fully distributed system。没有master node，各个机器平权。

s*******f
发帖数: 1114

u can check Amazon dynamo, p2p distributed system, more reliable, but not as
efficient as gfs, i think.

【在 g*********e 的大作中提到】

: 应该是fully distributed system。没有master node，各个机器平权。

(共1页)

进入JobHunting版参与讨论

相关主题
● 问道MS的面试题	● 不会C++，后果多严重？
● 请教一道面试题	● 据说CS面试有三篇必看paper……
● 面试题求助	● 驳G家的技术不如FLA先进
● 初级SQL问题	● 请教2个 huge file的面试题
● 还有一周onsite，怎么看Hadoop.The.Definitive.Guide效率最高？	● 问一道面试题
● 前员工追忆百度乱象：上下异心狼性压制人性（转载）	● Amazon面试题的疑惑，5个包子答谢！
● Google 的所谓核心技术的部门是哪些	● 一道Google面试题
● 准备跟800题大牛一起啃system design的硬骨头了	● 请教一道面试题

相关话题的讨论汇总
话题: db话题: friends话题: going话题: friend话题: given

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天