AboutBlogResearchProjectsTeaching
Back to Blog

Information and Coding Theory (ICT Part 1)

15 min read

Introduction

What is Information Theory:

Information theory answers two fundamental questions in communication theory: What is the ultimate data compression (answer: the entropy HH), and what is the ultimate transmission rate of communication (answer: the channel capacity CC).

Data transmission at the physical layer:

Data transmission flowchart

Data Compression:

Data compression is done by the source encoder. Well-defined data sets such as the english alphabet, images, and audio need to be converted into bit strings before being modulated into signal form by the transmitter. Data compression achieves this through lossless (no information is lost) or lossy (information is lost) compression.

Transmitter:

Converts the binary data from the physical layer to a signal.

Erroneous Channel:

The transmission channel is erroneous and can result in bit flipping in the received data.

Channel Encoder:

Modifies the binary data such that the channel decoder can recover error less data; this is done by appending a redundancy code to the data. Error detection can be done using cyclic redundancy codes (CRC) and error correction can be done using stop-and-window.

Adding redundancy codes will occupy the channel transmission rate that could have been utilised by the encoded source data. We want an optimal rate (which depends on the channel) for transmitting the data with error correcting codes.

The entropy of the source gives a lower bound on how much data compression we can apply to make the channel communication optimal.

Hence, compression has two metrics:

  1. Compression ratio
  2. Resulting transmission speed

Source Encoding

Data Compression

The information at the source can be in any form (even non-binary), but the source encoder should convert it to binary form as binary is well suited to be converted to signal form.

Source encoding flowchart

Ω:\Omega:

Ω\Omega is the alphabet of the source. We assume it to be a finite set (infinite sets are dealt with in a different way). For example Ω={a1,a2,,an}\Omega = \{a_1,a_2, \dots , a_n\}

Source encoding function:

C:Ω{0,1}={www is a finite length binary string}C: \Omega \rightarrow \{0,1\}^* = \{w \mid w \text{{w is a finite length binary string}}\}

CC maps a single alphabet (aia_i) to a binary string

C:Ω{0,1}C^* : \Omega^* \rightarrow \{0,1\}^* CC^* maps a finite length string with aiΩa_i \in \Omega.

The (1) requirement for CC is that CC should be injective. If CC is injective then CC is non-singular (same for CC^*).

The (2) requirement for CC is that C(Ω)C(\Omega) is a set of codes {0,1}\subseteq \{0,1\}^*.

Assuming a defined Ω\Omega we will interchangebly call CC as a code sometimes.

For example, C(ai1,ai2,,ain)=C(ai1C(ain))C^*(a_{i_1},a_{i_2}, \dots , a_{i_n}) = C(a_{i_1} \dots C(a_{i_n})) (concatenated)

Example 1:

Ω={1,2,3,4}\Omega = \{1,2,3,4\}

C:Ω{0,1}C: \Omega \rightarrow \{0,1\}^* defined as:

101 \rightarrow 0

20102 \rightarrow 010

3013 \rightarrow 01

4104 \rightarrow 10

Problem: this cannot be decoded correctly, as C(3).C(1)=010C(3).C(1) = 010, C(2)=010C(2) = 010, and C(1).C(4)=010C(1).C(4) = 010.

Hence, we get to the requirement (3) CC* should be injective.

(4) An encoder CC is called uniquely decodable if CC^* is non-singular.

Now we can fix the problem we faced earlier, but will the decoding be efficient?

Example 2:

Here, we consider the case that CC is uniquely decodable.

Ω={1,2,3,4}\Omega = \{1,2,3,4\}

C:Ω{0,1}C : \Omega \rightarrow \{0,1\}^*

1101 \rightarrow 10

2002 \rightarrow 00

3113 \rightarrow 11

41104 \rightarrow 110

If we get a 1111, we check if we have odd number of following 00s or even. Even number of 00s would result in the decoding 3232\dots and odd number of 00s would result in the decoding 4242\dots.

Problem: In the worst case, we will have to read the entire string to determine the parity of 00 bits.

-> One possible fix could be to demarcate the string 010010 as 0,100,10 and get a uniquely decodable CC without requiring CC^* to be non-singular; however, this will occupy a lot of extra bits post encoding which can grow in the size of the length of the string.

Following on this problem, we arrive at the requirment (5) for CC.

(5) CC should be prefix-free. That is, no codeword should be extendable to another codeword

CC is prefix-free if for for no yC(Ω),yC(Ω)y \in C(\Omega), \exists y' \in C(\Omega) such that yyC(Ω)yy' \in C(\Omega).

Prefix-free \subseteq Uniquely decodable \subseteq Non-Singular \subset Set of all codes;