diff --git a/_posts/2020-12-09-osi.markdown b/_posts/2020-12-09-osi.markdown
index 3d09377..099ef68 100644
--- a/_posts/2020-12-09-osi.markdown
+++ b/_posts/2020-12-09-osi.markdown
@@ -4,8 +4,8 @@ title: The OSI Model
categories: ["networks"]
---
-The OSI model lays down specifications on how to think about achieving inter-process communication across machines. This article goes over breaking down the
-what and why of this model and introduce the readers to some of the jargon surrounding this topic.
+The OSI model lays down specifications on how to think about achieving inter-process communication across machines. This article goes over breaking down the
+what and why of this model and introduce the readers to some of the jargon surrounding this topic.

@@ -13,7 +13,7 @@ what and why of this model and introduce the readers to some of the jargon surro
# The Mail Analogy
-It is useful to generalize by drawing comparisons with the [postal system](https://en.wikipedia.org/wiki/Mail) when it comes to understanding inter-process communication across machines.
+It is useful to generalize by drawing comparisons with the [postal system](https://en.wikipedia.org/wiki/Mail) when it comes to understanding inter-process communication across machines.
Human|Computer|Notes
-|-|-
@@ -25,16 +25,16 @@ The postal network efficiently transfers the post to the intended recipient|The
The other person receives the letter and opens it| The OS unpacks the network packet and provides the content to the intended process| **Transfer the message content to the recipient**
The receiver reads the letter|The intended process reads the message stream|**Consume the message content**
-> An **IP Address** is the unique identifier for a machine. There are 2 versions of it - IPv4 and IPv6. An IPv4 IP address looks like this - `10.0.0.1`.
+> An **IP Address** is the unique identifier for a machine. There are 2 versions of it - IPv4 and IPv6. An IPv4 IP address looks like this - `10.0.0.1`.
>
> A **Port Number** is a number between 1 and 65536 that is used to identify the sender process within an machine. The port number
-is a purely logical construct.
+is a purely logical construct.
-## The Postal Network
+## The Postal Network
The postal network is an indispensible system when it comes to delivering mails. It lays down rules regarding how a mail needs to be submitted so that it is suitable for transfer - which makes many of the steps before and after the Postal network step in the above analogy the way it is. Clearly this is the most important of them all.
-If we double click, we broadly get the following roles -
+If we double click, we broadly get the following roles -
1. **Post collector** - collects posts from the postbox and transfers to the post office, ensures stamps and addresses are present.
2. **Post office** - collects all posts from and to a locality. Maintains routing registries based on PIN or ZIP codes and determines the target post office to send the mail to, for each mail.
@@ -44,15 +44,15 @@ If we double click, we broadly get the following roles -
Each role has
* well-defined set of responsibilities.
-* well-defined, limited, interactions with the rest of the system.
+* well-defined, limited, interactions with the rest of the system.
A "post dispatcher" understands the protocol related to delivering a post to a house. The "post office" might operate just on the post bundle it gets, creating few more bundles grouped by destination office. The "means of transport" only understands transferring a bundle from one place to another.
-We could now start with laying down a convention for all postal systems to contain these roles. With some more elaborate treatment of the subject, we can come up with a **model** of a postal network.
+We could now start with laying down a convention for all postal systems to contain these roles. With some more elaborate treatment of the subject, we can come up with a **model** of a postal network.
It is now easy to identify people, or machines, to train and fit into these roles easily. Not only can they work in a given postal company for that role, they can work in any postal company for that matter. The users also have a fairly good idea on how a specific postal system operates, if it uses the above model as a guidance.
-In the computer world, a similar abstraction has been laid down for the postal network in the form of the **OSI Model**.
+In the computer world, a similar abstraction has been laid down for the postal network in the form of the **OSI Model**.
# Open Systems Interchange Model
@@ -62,23 +62,23 @@ Out of this, the first 4 layers are responsible for end to end transport of a me
The below is a simplified description of these 4 layers. An actor in each layer could be either a software, hardware or a hybrid of both.
-## Layer 4 - Transport Layer
+## Layer 4 - Transport Layer
The application sends the required data to be sent to this layer. The Transport Layer batches the data in a way that is easy to transmit further.
This layer cannot understand the application data, and just knows when the first bit starts and the last bit ends in the message. The agents at this layer are typically OS level softwares that stamp this data with some additional metadata - such as the source and destination IP addressses, port numbers, and the size of the message.
-When a process sends application data to this layer, it can expect that the data will be received by the target process.
+When a process sends application data to this layer, it can expect that the data will be received by the target process.
## Layer 3 - Network Layer
The Layer 4 hands off the application data packet (or network packet) to Layer 3. This is a combination of hardwares and softwares working in tandem to route the network
-packet from source device to the target device over the internet. It can complete with reliable multiple hops over many other devices. This is an encapsulation of the postal system from
+packet from source device to the target device over the internet. It can complete with reliable multiple hops over many other devices. This is an encapsulation of the postal system from
the letter anology.
## Layer 2 - Data Link Layer
-Responsible for transmitting a 'frame' of data reliably from one node to another without errors. A 'frame' contains some more metadata that is required to perform the error detection and correction when the data is received at a node. This is to make sure that the correct data is being passed on. These are often intelligently deviced hardwares that understand if a piece of data has been correctly transferred or not. Network packet transfer at Layer 3 hands this off to Layer 2 that transmits them as 'frames' and hands the data back (removing any 'frame' metadata) to the Layer 3 in the second node.
+Responsible for transmitting a 'frame' of data reliably from one node to another in a network without errors. A 'frame' contains some more metadata that is required to perform the error detection and correction when the data is received at a node. This is to make sure that the correct data is being passed on. These are often intelligently deviced hardwares that understand if a piece of data has been correctly transferred or not. Network packet transfer at Layer 3 hands this off to Layer 2 that transmits them as 'frames' and hands the data back (removing any 'frame' metadata) to the Layer 3 in the second node.
## Layer 1 - Physical layer
@@ -88,16 +88,16 @@ In order for the Layer 2 device to send and/or receive a frame, it needs a mediu
We can call the remaining 3 layers as the Application Layer.
-When you write a letter to some one, you assume that the other person is able to understand the language, as well as comprehend your writing. Likewise, this layer is an abstraction over the protocols used by the processes themselves so that they know how to read a message.
+When you write a letter to some one, you assume that the other person is able to understand the language, as well as comprehend your writing. Likewise, this layer is an abstraction over the protocols used by the processes themselves so that they know how to read a message.
The Application layer constructs whatever message it needs to send, and transfers it to the Network Layer in that machine. It also receives the message from the Network layer. This is generally the code that developers write as part of their application.
-> These 3 layers have many overlaps between them that it would confuse the reader when they are new to this subject. As this series of articles as an introduction to the practical usages of the model, we can combine these 3 layers into 1 and call it the Application layer.
+> These 3 layers have many overlaps between them that it would confuse the reader when they are new to this subject. As this series of articles as an introduction to the practical usages of the model, we can combine these 3 layers into 1 and call it the Application layer.
Pedantically that is the Layer 7 of the model, but practically it makes a lot of sense to just call everything above Layer 4 as the application layer. There is lot of literature on the differences between these layers for the interested.
# In Summary
-This is how a transfer of information from source application to destination application would look like, if we are to talk in terms of the different abstraction layers of the OSI. Note that the data travels across multiple devices in the network (such as routers). While the network packet has not reached the destination machine, that packet is not forwarded to the networking layer in that machine.
+This is how a transfer of information from source application to destination application would look like, if we are to talk in terms of the different abstraction layers of the OSI. Note that the data travels across multiple devices in the network (such as routers). While the network packet has not reached the destination machine, that packet is not forwarded to the transport layers of other intermediate machines.

@@ -105,6 +105,6 @@ It is common practise to abbreviate the layers as **L4**, **L7** and so on.
## What's next
-Developers program at the application layer and have constructs in the programming language that would talk to the Transport Layer of the machine. The implementation of the OSI model in practise is the **[Internet Protocol Suite (TCP/IP)](https://en.wikipedia.org/wiki/Internet_protocol_suite)**. It is important to have a practical understanding of TCP/IP which would prove useful in a variety of situations.
+Developers program at the application layer and have constructs in the programming language that would talk to the Transport Layer of the machine. The implementation of the OSI model in practise is the **[Internet Protocol Suite (TCP/IP)](https://en.wikipedia.org/wiki/Internet_protocol_suite)**. It is important to have a practical understanding of TCP/IP which would prove useful in a variety of situations.
More on this in the next article!
diff --git a/_posts/2020-12-17-flatlands.markdown b/_posts/2020-12-17-flatlands.markdown
index 7b4dab3..2e293a0 100644
--- a/_posts/2020-12-17-flatlands.markdown
+++ b/_posts/2020-12-17-flatlands.markdown
@@ -12,15 +12,15 @@ My thoughts on the 1884 novella **Flatlands - a romance in many dimensions by Ed
The book follows the adventures of a Square that lives in a 2 dimensional world.
-The first question that came to my mind when I read this premise was, given that all beings in such a 2 dimensional universe can only see a straight line, how then can they distinguish different shapes? This problem is tacked very early on in the book, with one of the methods being using sight to understand different gradients in the straight line that they see. For example, a triangle when looked at directly is a straight line with the brightest point in the center, and both the sides with a reducing intensity gradient.
+The first question that came to my mind when I read this premise was, given that all beings in such a 2 dimensional universe can only see a straight line, how then can they distinguish different shapes? This problem is tackled very early on in the book, with one of the methods being using sight to understand different gradients in the straight line that they see. For example, a triangle when looked at directly is a straight line with the brightest point in the center, and both the sides with a reducing intensity gradient.

-As soon as I read this, it struck me how we do the same thing in perceiving depth when we see a 2 dimensional picture.
+As soon as I read this, it struck me how we do the same thing in perceiving depth when we see a 3 dimensional picture.
Most of the mathematical concepts in this book are taught even to kids at a very early stage, but the perspective given to that knowledge by this book, especially without pictures - the interaction of the 3rd dimension with the 2 dimensional world, was a very good mental exercise for me.
-My favourite part of the book comes towards the end where the author draws similarilities and brings to our attention the patterns while going from one dimension to another. This book teaches us how to think about the 4th dimension, how a higher dimension's being interactions with a lower dimension manifest itself to the eyes of the lower dimension. There is a common trope in sci-fi films where an object vanishes out of existence when dealing with the 4th dimension. A little bit of reflection on the contents of the last few chapters of this book akes it clear why that trope is necessary!
+My favourite part of the book comes towards the end where the author draws similarilities and brings to our attention the patterns while going from one dimension to another. This book teaches us how to think about the 4th dimension, how a higher dimension's being interactions with a lower dimension manifest itself to the eyes of the lower dimension. There is a common trope in sci-fi films where an object vanishes out of existence when dealing with the 4th dimension. A little bit of reflection on the contents of the last few chapters of this book makes it clear why that trope is necessary!
There is a [video](https://www.youtube.com/watch?v=CePeCicTqCM) of [Neil De'grasse Tyson](https://en.wikipedia.org/wiki/Neil_deGrasse_Tyson) (the video is not related to this book) explaining the concepts in passages of the book. I do not want to give too much away from the book and would suggest to go through the book first before the video so that you do not spoil the satisfaction of learning this through the words of a professor from 1884!!
diff --git a/_posts/2021-07-25-dns.markdown b/_posts/2021-07-25-dns.markdown
index aef0a5d..46acd64 100644
--- a/_posts/2021-07-25-dns.markdown
+++ b/_posts/2021-07-25-dns.markdown
@@ -1,88 +1,85 @@
---
layout: post
-title: Connections
+title: DNS Basics
categories: ["networks"]
---
-It is impractical to remember IP addresses of any target service and address them. While it is expensive to get hold of a static IP address, it would be a nightmare to propagate that change to all the consumers of your application when the IP address changes.
-
-DNS solves this problem by providing a proxy name to the IP address. You can now distribute this to the clients who can address your device using this address.
+Today, computers run useful processes that other users would use using their own computer device. Examples of such processes are the google search engine, the amazon online store, your favourite flight operator's booking service and so on. How does your computer know where these processes are running ?
---------------
+## Where is my coffee ?
+You are in _Coffee Land_ on a business visit and feel like drinking coffee after a long day at work. You want to go to a restaurant. You decide you would to walk into _Quality Coffee_ after spending some time consulting your colleagues.
-## Network
+You are told _Quality Coffee_ is at _No 4, Coffee Bean Avenue, Roasted County, Coffee Land_. You go there, order coffee, drink it and leave satisfied (possibly after paying in case you don't want to be arrested).
-Interconnected group of computers that are able to send and receive information from one other defines a network. There are different [network topologies](https://en.wikipedia.org/wiki/Network_topology) that define how the computers should be connected to one other.
+With some reflection, you can see that the address is more useful to you when you need to navigate to the correct shop and order coffee. However, the product that you sought was _Quality Coffee_. When you later talk about this to your other colleagues back home, you would say you visited _Quality Coffee_ and not _No 4,..._. The shop can move to a different place, but the service that you experienced is tied to the brand and it would remain the same.
-The Internet is a network.
+Here we two pieces of information - **Name** and **Address**. You seek the services of **Name**. You need to know the its current **Address** to actually get the services you need.
-## IP Address
+The **Name** is generally the branch associated with the service. In the computer world, the address, which, you might already guess, would be the **IP Address**.
-An IP address is an identifier for a machine in a network. It was initially 32 bits in size, but for more than 2 decades now, a 64 bit version is also in use. The limitation of the 32 bit address is that [there are more than **232** devices](https://en.wikipedia.org/wiki/IPv4_address_exhaustion) connected to the internet. There are some mitigation strategies for this limitation that change how these 32 bit addresses are handled and exposed.
+> Recall that an **IP Address** is the unique identifier for a machine. There are 2 versions of it - IPv4 and IPv6. An IPv4 IP address looks like this - `10.0.0.1`.
-Here's how the 2 versions of IP addresses are expressed for this site.
+Software engineers behind the Youtube service, will have installed their softwares that makes video content searchable and playable, in a computer, and advertise its IP address to everyone. Users can go to their browser and type in the IP address of youtube and they could be presented with a web page - which is an interface used for searching and playing videos.
-- 32-bit also called IpV4 : `139.180.190.208`
-- 64 bit also called IpV6 : `2001:19f0:4400:78f6:5400:3ff:fe17:349c`
+## I hate memorizing numbers!
-## Port number
+Everyone does!
-This is a logical construct identifying a process in a computer.
+As of writing this article, one of the Ip addresses of the youtube services is `74.125.68.93`.
-An application, when it needs to send/receive packets from another computer, it needs to create a binding to a port. When an application sends a packet intended for a specific application in another computer, the operation system of the target computer looks at the port number in the incoming packet, and forwards the packet to the application process that is listening for packets on this port.
+It is impractical to remember IP addresses of any target service and address them. In the real world, these IP addresses would change over time more often than how real world addresses change. It would be a nightmare for both the youtube engineers as well as the end user to keep track of this.
-It is an integer from 1 to 65536. The operating system might prevent binding to port numbers less than 1000 as these are well-known ports. These run well-known processes and therefore optimises routing at the OS layer. Examples are - port 80 for web server applications an 22 for SSH.
+Enter the **Domain Name System** or **DNS** for short. DNS solves this problem of remembering arbitrary numbers for addresses to online services by providing it a more memorable **Domain Name** such as `www.youtube.com`. The engineers at youtube can simply say - if you want to search and watch videos, head over to **YouTube** (the name or brand). And hey, you can find YouTube at `www.youtube.com` (the address).
-## TCP and UDP
+for most people `www.youtube.com` is more memorable than `74.125.68.93`. This also solves the problem of moving IP addresses. When the IP address changes to lets say `74.125.68.100`, we can simply update the domain name system so that the domain name `www.youtube.com` points to the new address. This change is silent and there is no impact on what the end user types in on their browser.
-These are 2 widely used connection protocols. Under TCP, the application get a guarantee that the target computer indeed received the packet (an acknowledgement). For this reason TCP is called connection oriented protocol. UDP doesn't provide an acknowledgement and is therefore called a connection less protocol.
+## How does address translation work ?
-The crux of these protocols is that, they add header and footer metadata on the message to construct a packet. When such a packet is delivered to the network, the network takes care of forwarding the packet to the correct target using these headers and footers.
+When you open your browser and type in `www.youtube.com`, the browser first negotiates with the domain name system to find out the IP address. This process is called **DNS address resolution**. Once the address is resolved, then the browser establishes a connection and is able to send and receive meaningful data (in this case, watch videos).
-## Connection
+It would seem there would be a straight forward process of maintaining a directory of a domain name to its ip address. But this poses few challenges.
-A connection is defined as the 5-tuple - (sourceIp, sourcePort, destinationIp, destinationPort, protocol).
+- **Scalability** : There are a lot of websites and querying a single directory can be computationally expensive. If a lot of users are querying to resolve the address of `www.youtube.com`, this can negatively impact other users that are trying to resolve lesser queried domain names such as `www.mysite.com`. The owner of `www.youtube.com` should somehow be help accountable for traffic coming in to the domain name system querying their domain. Sites can also have multiple sub-domains such as `www.blog.mysite.com` so there isn't a strict one to one mapping with a owner of a web site.
+- **Security** : Coordinating updates from different domain name owners securely is challenging. For example, I should not update the address of `www.yoursite.com` while I was updating `www.mysite.com`.
+- **Conditional resolution** : Often times, a service is hosted in multiple locations (think multiple branches of _Quality Coffee_ setup in different cities so that the brand is more accessible), for a better user experience. The YouTube IP addresses shown above are probably the one on a computer physically closest to my house for better latencies. So the central directory should now keep track of different locations and determine where the lookup request is coming from.
+- **Cost** : Solving any of the above problems would require more money to set up this domain name system directory. More disk space, computing power (to serve domain name queries), power supply, cooling system etc.
-- **sourceIp** - IP address of the source in the network.
-- **sourcePort** - Port number used by the source process.
-- Similarly for destination Ip and Ports.
-- **protocol** - TCP or UDP.
+## Delegated resolution
-We will focus on just TCP protocol here.
+Instead of storing all the domain names in a single directory (called a DNS server), there are multiple directories for every sub-domain level. As an example imagine knowledge of a separate DNS server for `mysite.com` that serves all DNS requests that end in `mysite.com`.
+- `www.mysite.com`
+- `blog.mysite.com`
+- `about.mysite.com`
+- `potato.chips.mysite.com`
-When we say that a TCP connection is established, both the source and the destination computers know about this 5-tuple. Any new packet with this information will be quickly forwarded to the correct process at the receiver. And any response on that packet, will use the same 5-tuple and be forwarded to the network.
+This distributes the DNS traffic and it is now up to the owners of the domain name to securely manage and scale their DNS servers.
-## TCP Connection handshake
+> But, there is still a problem. How do I figure out what is the DNS server for `mysite.com` ?
-A message in the below paragraphs is a connection + some message bytes.
+There are a set of reserved suffixes by which a domain name can end. Example `com`, `in`, `us`, `org`, `net`. These are called **Top Level Domains (TLD)**.
-To establish a TCP connection, the source computer sends a message called **SYN** - signalling start of connection establishment. The destination computer responds with a **SYN-ACK** - indicating that it is ready to accept a connection. The source receiving the **SYN-ACK** validates the target ip address. The source computer now sends an **ACK** - signalling that it will send messages using that 5-tuple from this point onwards. The destination receiving this **ACK** means that the destination's previous message reached the source, validating the ip address of the source for the destination.
+Each one of this have their own DNS server operated by an organisation or a country. The owner of `mysite` would set up a DNS server resolving IP addresses of all his sub domains ending in `mysite.com`. Then they would get hold of the operator for the `com` domain and ask them to add a record for `mysite` pointing to a DNS server that he just set up.
-1. Source : Send SYN, (IPsource, Portsource, IPdest, Portdest, TCP)
-1. Destination : Receive SYN, (IPsource, Portsource, IPdest, Portdest, TCP)
-1. Destination : Send SYN-ACK (IPdest, Portdest, IPsource, Portsource, TCP)
-1. Source : Receive SYN_ACK (IPdest, Portdest, IPsource, Portsource, TCP)
-1. Source : Send ACK (IPsource, Portsource, IPdest, Portdest, TCP)
-1. Source : Mark connection as established.
-1. Destination : Receive ACK (IPsource, Portsource, IPdest, Portdest, TCP)
-1. Destination : Mark connection as established.
+> Wait, how do I figure out what is the DNS server for `com` ?
-This handshake is also called the [3-way handshake](https://en.wikipedia.org/wiki/Handshaking). This establishes the source and target ip address validity.
+All the top level domains are hosted in another DNS server called the **Root DNS Server**. These are operated by a non-profit [Internet Assigned Numbers Authority](https://en.wikipedia.org/wiki/Internet_Assigned_Numbers_Authority) organization. The list of the root dns servers are hard coded in your computer when you buy it.
-## Security
+Now a typical name resolution works like this
-With TCP, the destination can setup ip firewall rules to only accept packets from specific sources, as well as only send packets to specific sources.
+- You enter www.mysite.com on the browser
+- Your browser consults the Root DNS server to find where the `com` DNS server is.
+- Your browser consults the `com` DNS server to find out where `mysite` is.
+- Your browser consults the `mysite` DNS server to find out where `www.mysite.com` is.
+- Your browser gets the IP address and makes a connection to the website.
-But this is not enough.
+Notice how the domain name is parsed from right to left - from the more generic to the more specific. If you think about it, this is similar to how we parse location addresses. When you needed to know where _No 4, Coffee Bean Avenue, Roasted County, Coffee Land_ is, you would parse it from right to left as
-If the network is compromised, malicious agents can inspect the data segment in the packet. Therefore, in practise, apart from the 3-way handshake, there is also the [TLS handshake](https://en.wikipedia.org/wiki/Handshaking) that establishes a connection to also send encrypted messages that can only be read by the source and destination and not by anyone intercepting the message.
+- Go to _Coffee Land_.
+- Find _Roasted County_ inside _Coffee Land_.
+- Find _Coffee Bean Avenue_ inside _Roasted County_.
+- Find _No 4_ inside _Coffee Bean Avenue_.
-## Cost of a connection
-
-Because of multiple handshakes involved, developers should keep in mind that establishing a new connection is an expensive process. A TLS connection takes around 1 to 2 seconds to setup. When the source and targets are fixed, reusing a connection is of utmost importance.
-
-## What's next
-
-While IP Addresses identify the destination computer, it is not practical to remember IP addresses of different machines. IP Addresses have proxies to them called **Domain Names**. In the next post, I will try to talk about how a Domain Name maps to a specific IP address.
+Fin!
\ No newline at end of file