问题1-1:给定一个数字数组,求出出现频率最高的K个数字;
// 堆排序
public List<Integer> topKFrequent(int[] nums, int k) {
// 时间复杂度:O(n)
Map<Integer, Integer> map = new HashMap<>();
for (int num : nums) {
map.put(num, map.getOrDefault(num, 0) + 1);
}
// 时间复杂度:O(n) * O(logn)
PriorityQueue<Integer> queue = new PriorityQueue<>(nums.length, (o1, o2) -> map.get(o1) - map.get(o2));
for (Integer key : map.keySet()) {
queue.add(key);
if (queue.size() > k) {
queue.poll();
}
}
// 时间复杂度:O(k)
List<Integer> result = new LinkedList<>();
while (!queue.isEmpty()) {
result.add(queue.poll());
}
// 时间复杂度:O(k)
Collections.reverse(result);
return result;
}
// 桶排序
public List<Integer> topKFrequent(int[] nums, int k) {
// 时间复杂度:O(n)
Map<Integer, Integer> map = new HashMap<>();
for (int num : nums) {
map.put(num, map.getOrDefault(num, 0) + 1);
}
// 时间复杂度:O(n)
List<Integer>[] buckets = new List[nums.length + 1];
for (Integer key : map.keySet()) {
int i = map.get(key);
if (buckets[i] == null) {
buckets[i] = new ArrayList<>();
}
buckets[i].add(key);
}
// 时间复杂度:O(n)
List<Integer> result = new LinkedList<>();
for (int i = buckets.length - 1; i > 0; i--) {
if (buckets[i] == null) {
continue;
}
for (Integer value : buckets[i]) {
if (result.size() >= k) {
return result;
}
result.add(value);
}
}
return result;
}
问题1-2:给定一个单词列表,求出单词出现频率K个,相同频率,字母序排序;
// Java 桶排序
public List<String> topKFrequent(String[] words, int k) {
// 时间复杂度:O(n)
Map<String, Integer> map = new HashMap<>();
for (String word : words) {
map.put(word, map.getOrDefault(word, 0) + 1);
}
PriorityQueue<String>[] buckets = new PriorityQueue[words.length + 1];
for (String key : map.keySet()) {
int i = map.get(key);
if (buckets[i] == null) {
buckets[i] = new PriorityQueue();
}
buckets[i].add(key);
}
// 时间复杂度:O(n)
List<String> result = new LinkedList<>();
for (int i = buckets.length - 1; i > 0; i--) {
if (buckets[i] == null) {
continue;
}
while (!buckets[i].isEmpty()) {
if (result.size() >= k) {
return result;
}
result.add(buckets[i].poll());
}
}
return result;
}
// Scala(MapReduce思想,效率不高)
def topKFrequent(words: Array[String], k: Int): List[String] = {
val result = words.map((_, 1))
.groupBy(_._1)
.map(x => (x._1, x._2.size)).toList
.sortWith((x, y) => x._2 > y._2 || (x._2 == y._2 && x._1 < y._1)).take(k)
.map(_._1)
result
}
问题2:从十亿数据中,找出最大的前K个数;
串行版本:
建一个K个数的最小堆,与堆顶比较,大于(等于)堆顶,依次插入堆,超过K个数,踢出堆顶
并行版本:
问题3:给定一个800G的nginx访问日志文件,2G内存,取出访问频率在top100的IP;
1、切分文件处理方式
2、MapReduce
3、Hive