发个evernote的code challenge - JobHunting版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

JobHunting版 - 发个evernote的code challenge

相关主题
● 上午偷闲把TopKFrequentWords写出来了	● 请教关于build heap BIG-O的问题
● 来讨论个uber的电面题	● 一道电面题，分享下，这个题应该用哪几个data structure?
● share int2roman and roman2int java version	● 菜鸟向大家请教个面试题
● 关于Implement hashtable的问题	● 问一个面试经常问的ood，维护前k名的list的问题
● twittier的onsite挂了，来问个常见题	● 问一道A家的面试题
● 问道indeed面试算法题	● leetcode 上 wordladderII 求教
● L家的高频题merge k sorted arrays giving iterators求讨论！	● 不明白leetcode OJ wordladder 2 总是 Time Limit Exceeded
● careercup书上那个maintain median value的题	● 微软有组在招new grad software engineer吗？

相关话题的讨论汇总
话题: string话题: integer话题: linkedlist话题: word话题: count

进入JobHunting版参与讨论

1

(共1页)

D***0 发帖数: 138	1 网上申请的，回复的挺快，安排了code challenge，一道题，不限时，半个小时写完了，发过去，第二天收到了thank you but 88.不知道哪里的问题。 * Write a function that takes two parameters: * (1) a String representing a text document and * (2) an integer providing the number of items to return. * Implement the function such that it returns a list of Strings ordered by word frequency, * the most frequently occurring word first. * Use your best judgement to decide how words are separated. * Your solution should run in O(n) time where n is the number of characters in the document. 这是我的solution，给了两种： public class getTopKFrequentWords { /* static class PQsort implements Comparator>{ @Override public int compare(Entry arg0, Entry arg1) { // TODO Auto-generated method stub return arg1.getValue() - arg0.getValue(); } }/ / this function treats words with the same frequency differently * 1. create a hashmap with , then split the input string into words, scan these words to * fill out the hashmap. O(n) * 2. create a size k min-heap(in java priority queue), iterate the hashmap, and put each * pair to the min-heap, which sorts based on the frequency. O(mlogk) m is the total number of words * 3. fill out a string list with the result min-heap in the above step. O(klogk) * * Total time complexity : O(n) + O(mlogk) + O(klogk) = O(n) / public static LinkedList getTopKFrequentWords(final String doc, final int k) { LinkedList ret = new LinkedList(); if(k <= 0 \|\| "".equals(doc)) return ret; HashMap map = new HashMap(); / // I use white space and as delimiter String[] words = doc.split("(\s\|\p{Punct})+"); for(String word : words) { int count = 0; if(map.containsKey(word)) { count = map.get(word); } count++; map.put(word, count); }/ StringBuilder word = new StringBuilder(); String lowercaseDoc = doc.toLowerCase(); for(int i=0; i { if(isLetter(lowercaseDoc.charAt(i))) { word.append(lowercaseDoc.charAt(i)); } else { // we have a word if(word.length() > 0) { int count = 0; if(map.containsKey(word.toString())) { count = map.get(word.toString()); } count++; map.put(word.toString(), count); word = new StringBuilder(); } } } // the last word if(word.length() > 0) { int count = 0; if(map.containsKey(word)) { count = map.get(word); } count++; map.put(word.toString(), count); word = new StringBuilder(); } PriorityQueue> minHeap = new PriorityQueue>(k, new Comparator , Integer>>(){ @Override public int compare(Map.Entry entry1, Map.Entry< String, Integer> entry2) { return entry2.getValue() - entry1.getValue(); } }); for(Map.Entry entry : map.entrySet()) { System.out.println(entry.getKey() + "-->" + entry.getValue()); minHeap.offer(entry); } int i = 0; while(i < k && minHeap.size() > 0) { //System.out.println(minHeap.peek().getKey() + "-->" + minHeap. peek().getValue()); ret.add(minHeap.poll().getKey()); i++; } return ret; } / this function gets all the words with the same frequency * 1. create a hashmap with , then split the input string into words, scan these words to * fill out the hashmap. O(n) * 2. create a hasmap with >, then scan the above map to fill out this map. O(m) m is the total number of words * 2. create a size k min-heap(in java priority queue), iterate the hashmap, and put each > * pair to the min-heap, which sorts based on the frequency. O(flogk) f is total number of frequencies * 3. fill out a string list with the result min-heap in the above step. O(klogk) * * Total time complexity : O(n) + O(m) + O(flogk) + O(klogk) = O(n) / public static LinkedList getTopKFrequentWords2(final String doc, final int k) { LinkedList ret = new LinkedList(); if(k <= 0 \|\| "".equals(doc)) return ret; HashMap map = new HashMap(); // I use white space and as delimiter / String[] words = doc.split("(\s\|\p{Punct})+"); for(String word : words) { int count = 0; if(map.containsKey(word)) { count = map.get(word); } count++; map.put(word, count); }*/ StringBuilder word = new StringBuilder(); String lowercaseDoc = doc.toLowerCase(); for(int i=0; i { if(isLetter(lowercaseDoc.charAt(i))) { word.append(lowercaseDoc.charAt(i)); } else { // we have a word if(word.length() > 0) { int count = 0; if(map.containsKey(word.toString())) { count = map.get(word.toString()); } count++; map.put(word.toString(), count); word = new StringBuilder(); } } } // the last word if(word.length() > 0) { int count = 0; if(map.containsKey(word.toString())) { count = map.get(word.toString()); } count++; map.put(word.toString(), count); } PriorityQueue>> minHeap = new PriorityQueue>>(k, new Comparator Entry>>(){ @Override public int compare(Map.Entry> entry1 , Map.Entry> entry2) { return entry2.getKey() - entry1.getKey(); } }); HashMap> statMap = new HashMap LinkedList>(); for(Map.Entry entry : map.entrySet()) { LinkedList list = null; if(statMap.containsKey(entry.getValue())) { list = statMap.get(entry.getValue()); } else { list = new LinkedList(); } list.add(entry.getKey()); statMap.put(entry.getValue(), list); } for(Map.Entry> entry : statMap.entrySet( )) { System.out.println(entry.getKey() + "-->" + entry.getValue()); minHeap.offer(entry); } int i = 0; while(i < k && minHeap.size() > 0) { //System.out.println(minHeap.peek().getKey() + "-->" + minHeap. peek().getValue()); ret.addAll(minHeap.poll().getValue()); i++; } return ret; } public static boolean isLetter(final char ch) { return (ch >= 'a' && ch <= 'z') \|\| (ch >= 'A' && ch <= 'Z'); } } 难道不是o(n)？
s******y 发帖数: 936	2 扫了一眼，逻辑对不对不太清楚。但是这种code写法是肯定不行的。 1.边界条件cover 不够 2.完全没有考虑内存，很可能溢出。 3.算法复杂度很大。 4.居然作为一个dev 不写unitest，这一点是最大的问题。 5. 没有足够的注解。 6. 多余的废话改删掉的就要删掉。 7. 随便看一眼就能看出好几个bug 问题太多了，先不说你的code 能不能work（肯定是不work的）。你的态度问题很大，你估计是刚毕业的吧。
D***0 发帖数: 138	3 请给个可能达到要求的呗，学习一下，下次知道哪里应该注意。谢谢。【在 s******y 的大作中提到】 : 扫了一眼，逻辑对不对不太清楚。但是这种code写法是肯定不行的。 : 1.边界条件cover 不够 : 2.完全没有考虑内存，很可能溢出。 : 3.算法复杂度很大。 : 4.居然作为一个dev 不写unitest，这一点是最大的问题。 : 5. 没有足够的注解。 : 6. 多余的废话改删掉的就要删掉。 : 7. 随便看一眼就能看出好几个bug : 问题太多了，先不说你的code 能不能work（肯定是不work的）。你的态度问题很大， : 你估计是刚毕业的吧。
s******y 发帖数: 936	4 先自己写几个test 然后跑一下。还有一点不能一个func 一写到底，要分成几个小模块写【在 D***0 的大作中提到】 : 请给个可能达到要求的呗，学习一下，下次知道哪里应该注意。谢谢。
r****s 发帖数: 1025	5 这应该很简单吧，什么unit test之类的就算了，考算法unit test个啥？ 1. 首先定义separator, (空格，逗号，句号，quote，double quote，period， EOF） 2. loop through the string char by char,加到一个string buffer里面，遇到上面定义的separator就返回一个word，同时reset string buffer。 3. count，可以用hashmap 4. 完了以后loop through hashmap, 找出topk。应该很少的code,大约十几行就行了。我觉得你的问题是shit load of code，简单的程序写得太多，一看就是没有经验 -- 有经验的码工都烦复杂程序，很难debug。
s******y 发帖数: 936	6 你工作中不写unit test？这个题重点不是考算法，算法答案网上可以搜到，考的是老板交给你一个任务你会怎么去完成。【在 r****s 的大作中提到】 : 这应该很简单吧，什么unit test之类的就算了，考算法unit test个啥？ : 1. 首先定义separator, (空格，逗号，句号，quote，double quote，period， EOF） : 2. loop through the string char by char,加到一个string buffer里面，遇到上面 : 定义的separator就返回一个word，同时reset string buffer。 : 3. count，可以用hashmap : 4. 完了以后loop through hashmap, 找出topk。 : 应该很少的code,大约十几行就行了。 : 我觉得你的问题是shit load of code，简单的程序写得太多，一看就是没有经验 -- : 有经验的码工都烦复杂程序，很难debug。
r****s 发帖数: 1025	7 扯淡。【在 s******y 的大作中提到】 : 你工作中不写unit test？这个题重点不是考算法，算法答案网上可以搜到，考的是老 : 板交给你一个任务你会怎么去完成。
z****e 发帖数: 54598	8 嗯，查找效率hashmap最快，set prime = 31，对于26个字母足够用，不会碰撞用priorityqueue反而需要lnk 最后有一个sort，这个时候再用priorityqueue都可以撒或者干脆就弄两个，一个hashmap一个priorityqueue 对于每一个单词，对map取出来，然后＋1之后put进去接收一个参数k，然后存top k在priorityqueue里面【在 r****s 的大作中提到】 : 这应该很简单吧，什么unit test之类的就算了，考算法unit test个啥？ : 1. 首先定义separator, (空格，逗号，句号，quote，double quote，period， EOF） : 2. loop through the string char by char,加到一个string buffer里面，遇到上面 : 定义的separator就返回一个word，同时reset string buffer。 : 3. count，可以用hashmap : 4. 完了以后loop through hashmap, 找出topk。 : 应该很少的code,大约十几行就行了。 : 我觉得你的问题是shit load of code，简单的程序写得太多，一看就是没有经验 -- : 有经验的码工都烦复杂程序，很难debug。
z****e 发帖数: 54598	9 String[] words = doc.split("(\s\|\p{Punct})+"); 这一步直接给你溢出你等于把整个文档全部读入内存你这样搞不是一个big data的搞法 input string的意思是file name a String representing a text document 意思是file name不是整个doc 要求你用input stream 依次读取这个file的character 这里考java io，上课说tf idf老师应该布置过这种作业吧？给你们一个超大的file，然后要求你去统计tf和idf 这个时候肯定不能直接先读取整个文档成string 要用input stream一点一点读，而且最好是buffered input stream 这个题目还是不错的我用python写过，慢得跟什么一样的，激励嘎啦跑了几十分钟我的laptop都快死机了，还是java好用
z****e 发帖数: 54598	10 这个考点罗列一下 1）hashcode，heap查找效率低，hashmap快，快很多 2）priorityqueue或者你说heap，min root heap这些 3）java io，不能全部读入内存，会爆的 4）tf idf，这个只考个tf的定义是常见的top k的变种这是第几次说top k question了？
D***0 发帖数: 138	11 这个我注释掉了。。。。。。。。。。。另外题意说是一个input string，既然这样，肯定内存能装下吧，要不就应该说input 是一个stream了。【在 z****e 的大作中提到】 : String[] words = doc.split("(\s\|\p{Punct})+"); : 这一步直接给你溢出 : 你等于把整个文档全部读入内存 : 你这样搞不是一个big data的搞法 : input string的意思是file name : a String representing a text document : 意思是file name不是整个doc : 要求你用input stream : 依次读取这个file的character : 这里考java io，上课说tf idf老师应该布置过这种作业吧？
D***0 发帖数: 138	12 这里是两个function，再把注释去掉，code也没剩多少了。。。。。【在 r****s 的大作中提到】 : 这应该很简单吧，什么unit test之类的就算了，考算法unit test个啥？ : 1. 首先定义separator, (空格，逗号，句号，quote，double quote，period， EOF） : 2. loop through the string char by char,加到一个string buffer里面，遇到上面 : 定义的separator就返回一个word，同时reset string buffer。 : 3. count，可以用hashmap : 4. 完了以后loop through hashmap, 找出topk。 : 应该很少的code,大约十几行就行了。 : 我觉得你的问题是shit load of code，简单的程序写得太多，一看就是没有经验 -- : 有经验的码工都烦复杂程序，很难debug。
z****e 发帖数: 54598	13 file name就是一个input string啊内存装不下的，你要用buffered input stream 我觉得是你误解了它要说的呢 string大小其实并不大，也就是几千个我记得我记得我用一万字杀进去 string对象就爆了 input 【在 D***0 的大作中提到】 : 这个我注释掉了。。。。。。。。。。。 : 另外题意说是一个input string，既然这样，肯定内存能装下吧，要不就应该说input : 是一个stream了。
z****e 发帖数: 54598	14 哈哈，但是你用了好几个的内部类和匿名类我看了都痛苦呢，这两个最好是避开使用的你单独写一个class在public class下面就好了你按照它说的改，我觉得过没啥问题它说的基本靠谱【在 D***0 的大作中提到】 : 这里是两个function，再把注释去掉，code也没剩多少了。。。。。

1

(共1页)

进入JobHunting版参与讨论

相关主题
● 微软有组在招new grad software engineer吗？	● twittier的onsite挂了，来问个常见题
● Given an int array and an int value. Find all pairs in arr	● 问道indeed面试算法题
● A面经	● L家的高频题merge k sorted arrays giving iterators求讨论！
● leetcode上最搞笑的是这题	● careercup书上那个maintain median value的题
● 上午偷闲把TopKFrequentWords写出来了	● 请教关于build heap BIG-O的问题
● 来讨论个uber的电面题	● 一道电面题，分享下，这个题应该用哪几个data structure?
● share int2roman and roman2int java version	● 菜鸟向大家请教个面试题
● 关于Implement hashtable的问题	● 问一个面试经常问的ood，维护前k名的list的问题

相关话题的讨论汇总
话题: string话题: integer话题: linkedlist话题: word话题: count

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)