Data Stream Median

Question

Numbers keep coming, return the median of numbers at every time a new number added.

Clarification

What's the definition of Median?

• Median is the number that in the middle of a sorted array. If there are n numbers in a sorted array A, the median is A[(n - 1) / 2]. For example, if A=[1,2,3], median is 2. If A=[1,19], median is 1.

Example

For numbers coming list: [1, 2, 3, 4, 5], return [1, 1, 2, 2, 3].

For numbers coming list: [4, 5, 1, 3, 2, 6, 0], return [4, 4, 4, 3, 3, 3, 3].

For numbers coming list: [2, 20, 100], return [2, 2, 20].

Challenge

Total run time in O(nlogn).

Tags

Related Problems

Hard Sliding Window Median 17 %
Easy Median 22 %
Hard Median of two Sorted Arrays

Problem description on LeetCode:

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.

For example,

`[2,3,4]`, the median is`3`

`[2,3]`, the median is`(2 + 3) / 2 = 2.5`

Design a data structure that supports the following two operations:

• void addNum(int num) - Add a integer number from the data stream to the data structure.
• double findMedian() - Return the median of all elements so far.

Example:

``````addNum(1)
findMedian() -> 1.5
findMedian() -> 2
``````

1. If all integer numbers from the stream are between 0 and 100, how would you optimize it?
2. If 99% of all integer numbers from the stream are between 0 and 100, how would you optimize it?

Analysis

The basic idea is to maintain two heaps: a max-heap and a min-heap. The max heap stores the smaller half of all numbers while the min heap stores the larger half. The sizes of two heaps need to be balanced each time when a new number is inserted so that their size will not be different by more than 1. Therefore each time when findMedian() is called we check if two heaps have the same size. If they do, we should return the average of the two top values of heaps. Otherwise we return the top of the heap which has one more element. -- @hanhanbu

https://discuss.leetcode.com/topic/27506/easy-to-understand-double-heap-solution-in-java

Notice

LeetCode中与LintCode里稍有不同，在于对于Median的定义：

LeetCode要求是如果是even number，就计算中间两个数字的平均值，也就是
`(maxHeap.peek() + (minHeap.peek()))/2`

`maxHeap.peek()`

Solution

Double Heap (minHeap + maxHeap)

``````public class Solution {
PriorityQueue<Integer> maxHeap;//lower half
PriorityQueue<Integer> minHeap;//higher half

/**
* @param nums: A list of integers.
* @return: the median of numbers
*/
public int[] medianII(int[] nums) {

int count = nums.length;
maxHeap = new PriorityQueue<Integer>(count, Collections.reverseOrder());
minHeap = new PriorityQueue<Integer>(count);

int[] ans = new int[count];

for (int i = 0; i < count; ++i) {
ans[i] = findMedian();
}
return ans;
}

// Adds a number into the data structure.
maxHeap.offer(num);
minHeap.offer(maxHeap.poll());

if(maxHeap.size() < minHeap.size()){
maxHeap.offer(minHeap.poll());
}
}

// Returns the median of current data stream
public int findMedian() {
if(maxHeap.size() == minHeap.size()){
return maxHeap.peek(); // Or `(maxHeap.peek() + (minHeap.peek()))/2`
}else{
return maxHeap.peek();
}
}
}
``````