Fragen aus Vorstellungsgesprächen für program manager, von Bewerbern geteilt
If you have n machines with a 10 GB string of characters on each, how do you find the most common character?
I got this one I thought. Just ask for a distribution of each character from each machine, send the tables to a master machine which added them up and found the character with the highest frequency. Then he asked questions like “If we make the network faster, does it make sense to send over all the data to one machine?” I respond “no”, and explain that it would actually degrade performance. Finally, toward the end, he asked
distributed reduction. each machine can send the most common character, its count, in its subset of data and do a tree based reduction to find the most common character. the corner case in this approach would be if the most common character is very sparsely distributed among all machines in such a way that its not the most common locally on each machine. to address that, we have have each machine prepare a table with character and its count and report that to a head node which can reduce the table from all machines and sort them to find the most common character.
Can you give an example for the above solution? Wouldn't that require you to pass the whole frequency table to the master node?
What do you do when you have a number of mission critical projects with limited resources and no one willing to compromise? 3 week sprints, 20 projects, 4 resources and 1 team lead.