Design an encoding function to condense repeating characters in a string.

Question

Anonym · Accepted Answer

is the returned string expected to be more like-
(char1)(count of char1)(char2)(count of char2).. and so on ?

Anonym · Answer

I'd offer this as initial solution:

"AAAAAABCCDDDXXXXFF" (18 chars)

=>

"A"\1\4"BCCDDDX"\1\2"FF" (14 chars)

- runs up to 3 are left as they are
- there is needed an escape character, to distinguish if the next is a lengh or a regular letter (which is \1 in this case - that means char value of 0x01)
- after the escape character there is the length of run - 2 (3 characters will be written as thy are, for 4 there will be 2 - there must not be stored count of 1 because we must count that there might be an escape character itself, which would be encoded by \1\1)
- therefore the escape character should be some very unlikely value, like 0x01 (if we'd chose e.g. ' *', it would be far more probable that somebody might feed us "A*B*C*D*" which would become "A**B**C**D**" after encoding, i.e. much longer)
- this can encode runs up to 256 successive characters. If the run is longer, it is split after 256 chars, another escape with length might be inserted (or left unchanged if the remainder is <= 2 characters)

- indeed, somebody might still feed us 'A'\1'B'\1'C'\1'D' - it is less likely, but still possible as intentional tampering. One improvement might be, that the first character of the encoded string might denote the actual escape character (the less frequent character in the original string)

There is indeed a bunch of additional possible improvements - ability to encode groups of characters, etc., you could principally re-implement deflate, lzma or whatever compression algorithm you prefer, but you'd probably not be able to code it on a whiteboard in 10 minutes or whatever time you have for this interview question :)

Google

Frage im Vorstellungsgespräch bei Google

Antworten zu Vorstellungsgespräch

Beobachtete Unternehmen

Jobsuchen