Sunday, June 11, 2017

Distributed Google Code Jam 2017 Round 2

Distributed Google Code Jam 2017 Round 2 screen cast by eddy1021

Without advancing to GCJ round 3, this is the other chance to advance to final in this year.

For this contest, my strategy is that first solving all the small test unless the large test is easy(-coding) enough. Then, with correct idea confirmed by small test, trying to distributed the data among nodes.

(Let's assume A is for problem B, B is for problem C, anyway (: )

At first, A seems to meet the condition that large is easy-coding enough. By substituted original sequence S with S' where S'_i = S_{i+1}-S_i, the problem is equivalent to find the longest segment with the same value. This is similar to a problem from round 1 which requires to distributed interval as [0..b_1], [b_1..b_2],...,[b_k..n). However, I can't remember why I did distributed interval as [0..b_1), [b_1..b_2),...,[b_k..n). Thus, it needs more information sent between nodes which make me struggle for about 10 or 20 minutes. Finally, solved it at about 30 mins.

After reading B, no any solution come to my mind even for small one. Thus, I head to C. C is the one that small is easy but not large. Thus, I decided to take small first and finish at about 40 mins.

There are lots of correct submissions for B now.

After experiment with some small examples for B, the base is only defined by the least significant one k which has x[k]+y[k] != z[k]. If all the x[i]+y[i]=z[i], it's clearly that there are infinity number of solutions. Otherwise, the possible solutions will be (x[k]+y[k]-z[k])'s divisors. But, the only possible one should be exactly x[k]+y[k]-z[k]. For all the other, it must less than or equal to one of x[k] or y[k]. Then, the remained problem is to test the base.
To distribute it, we can first distribute the process finding the least significant one k with x[k]+y[k] != z[k] meanwhile we can all find the maximum value among x,y,z which should be less than the possible answer.
Then, to test the base, we should actually sum x, y up under that base. But, we should distribute this process too. The most important observation is that, there are only two possible carry when sum up two numbers. Therefore, we can distributed each segment and calculate the result when carry is 0 and 1, respectively. Then, sending those information to master node, we can check the actual scenario now.
Solved B at about 1 hour and 14 mins.

From the scoreboard, there are just few submissions for C-large, D-large. Therefore, I decided to spent remaining time and head for D-large which worth 30 points.

The problem is equivalent to find number of cells of a young diagram with at most 10^12 rows and columns while there are at most 10^7(10^5 for small) different lengths of row. You can query a cell with response specifying whether it's in the table.
Therefore, for small, from first column, we can just binary search the number of cell this row contains and then, binary search that to which column, the length of row will remain the same.
Since the time complexity is based on number of changes of length of row, the goal is to distribute it. But, we cannot know the distribution first. Therefore, I decided to distribute it recursively. First, I just distribute 10^12 columns into 100 chunks. For each node, do the same thing as small part but at most turn about 10^5 turns. If there are remaining interval without calculating, return it to master node. For each round, master node collect answers and remaining interval. Then, redistribute those intervals into 100 chunks. The whole process will be done in expected log_{100} 10^12 rounds.

Finally, C-large remained with about 48 mins left.

First idea for C-large is to binary search broken position. If first broken position among all nodes is x, the hash(xor) of first i<x elements will all be the same. Then, once found first broken position, we can also know which node with it. After eliminating out this node, search for next position with 99 remaining nodes.
After find first 50 broken positions, search broken position from back again.
However, this solution requires about totally 4000 messages sent among nodes which exceeds the limit.
Another idea is to do the above thing among first 10 nodes. Then for each node, pair up with one of first node which must correct for first some elements. However, this is with very high coding complexity. Thus, I give up this one.
The other idea came up is that sqrt decomposition the elements and pair up nodes. For each pair, we can find one or two chunks with different hash value. Then, again found out the actual different positions. Finally, we have only 100 candidates. We can do the same thing as small.
However, I misestimated the message size and thought that it would exceed message size limit....

Finally, only missing C-large, with 75 points and rank 16 on final standing.
Advance to Distributed Google Code Jam Final!
Hope I will gain something there!

No comments:

Post a Comment