bbb/_posts/2021-07-25-dns.markdown
2021-11-17 19:29:53 +00:00

85 lines
7.7 KiB
Markdown

---
layout: post
title: DNS Basics
categories: ["networks"]
---
Today, computers run useful processes that other users would use using their own computer device. Examples of such processes are the google search engine, the amazon online store, your favourite flight operator's booking service and so on. How does your computer know where these processes are running ?
---------------
## Where is my coffee ?
You are in _Coffee Land_ on a business visit and feel like drinking coffee after a long day at work. You want to go to a restaurant. You decide you would to walk into _Quality Coffee_ after spending some time consulting your colleagues.
You are told _Quality Coffee_ is at _No 4, Coffee Bean Avenue, Roasted County, Coffee Land_. You go there, order coffee, drink it and leave satisfied (possibly after paying in case you don't want to be arrested).
With some reflection, you can see that the address is more useful to you when you need to navigate to the correct shop and order coffee. However, the product that you sought was _Quality Coffee_. When you later talk about this to your other colleagues back home, you would say you visited _Quality Coffee_ and not _No 4,..._. The shop can move to a different place, but the service that you experienced is tied to the brand and it would remain the same.
Here we two pieces of information - **Name** and **Address**. You seek the services of **Name**. You need to know the its current **Address** to actually get the services you need.
The **Name** is generally the branch associated with the service. In the computer world, the address, which, you might already guess, would be the **IP Address**.
> Recall that an **IP Address** is the unique identifier for a machine. There are 2 versions of it - IPv4 and IPv6. An IPv4 IP address looks like this - `10.0.0.1`.
Software engineers behind the Youtube service, will have installed their softwares that makes video content searchable and playable, in a computer, and advertise its IP address to everyone. Users can go to their browser and type in the IP address of youtube and they could be presented with a web page - which is an interface used for searching and playing videos.
## I hate memorizing numbers!
Everyone does!
As of writing this article, one of the Ip addresses of the youtube services is `74.125.68.93`.
It is impractical to remember IP addresses of any target service and address them. In the real world, these IP addresses would change over time more often than how real world addresses change. It would be a nightmare for both the youtube engineers as well as the end user to keep track of this.
Enter the **Domain Name System** or **DNS** for short. DNS solves this problem of remembering arbitrary numbers for addresses to online services by providing it a more memorable **Domain Name** such as `www.youtube.com`. The engineers at youtube can simply say - if you want to search and watch videos, head over to **YouTube** (the name or brand). And hey, you can find YouTube at `www.youtube.com` (the address).
for most people `www.youtube.com` is more memorable than `74.125.68.93`. This also solves the problem of moving IP addresses. When the IP address changes to lets say `74.125.68.100`, we can simply update the domain name system so that the domain name `www.youtube.com` points to the new address. This change is silent and there is no impact on what the end user types in on their browser.
## How does address translation work ?
When you open your browser and type in `www.youtube.com`, the browser first negotiates with the domain name system to find out the IP address. This process is called **DNS address resolution**. Once the address is resolved, then the browser establishes a connection and is able to send and receive meaningful data (in this case, watch videos).
It would seem there would be a straight forward process of maintaining a directory of a domain name to its ip address. But this poses few challenges.
- **Scalability** : There are a lot of websites and querying a single directory can be computationally expensive. If a lot of users are querying to resolve the address of `www.youtube.com`, this can negatively impact other users that are trying to resolve lesser queried domain names such as `www.mysite.com`. The owner of `www.youtube.com` should somehow be help accountable for traffic coming in to the domain name system querying their domain. Sites can also have multiple sub-domains such as `www.blog.mysite.com` so there isn't a strict one to one mapping with a owner of a web site.
- **Security** : Coordinating updates from different domain name owners securely is challenging. For example, I should not update the address of `www.yoursite.com` while I was updating `www.mysite.com`.
- **Conditional resolution** : Often times, a service is hosted in multiple locations (think multiple branches of _Quality Coffee_ setup in different cities so that the brand is more accessible), for a better user experience. The YouTube IP addresses shown above are probably the one on a computer physically closest to my house for better latencies. So the central directory should now keep track of different locations and determine where the lookup request is coming from.
- **Cost** : Solving any of the above problems would require more money to set up this domain name system directory. More disk space, computing power (to serve domain name queries), power supply, cooling system etc.
## Delegated resolution
Instead of storing all the domain names in a single directory (called a DNS server), there are multiple directories for every sub-domain level. As an example imagine knowledge of a separate DNS server for `mysite.com` that serves all DNS requests that end in `mysite.com`.
- `www.mysite.com`
- `blog.mysite.com`
- `about.mysite.com`
- `potato.chips.mysite.com`
This distributes the DNS traffic and it is now up to the owners of the domain name to securely manage and scale their DNS servers.
> But, there is still a problem. How do I figure out what is the DNS server for `mysite.com` ?
There are a set of reserved suffixes by which a domain name can end. Example `com`, `in`, `us`, `org`, `net`. These are called **Top Level Domains (TLD)**.
Each one of this have their own DNS server operated by an organisation or a country. The owner of `mysite` would set up a DNS server resolving IP addresses of all his sub domains ending in `mysite.com`. Then they would get hold of the operator for the `com` domain and ask them to add a record for `mysite` pointing to a DNS server that he just set up.
> Wait, how do I figure out what is the DNS server for `com` ?
All the top level domains are hosted in another DNS server called the **Root DNS Server**. These are operated by a non-profit [Internet Assigned Numbers Authority](https://en.wikipedia.org/wiki/Internet_Assigned_Numbers_Authority) organization. The list of the root dns servers are hard coded in your computer when you buy it.
Now a typical name resolution works like this
- You enter www.mysite.com on the browser
- Your browser consults the Root DNS server to find where the `com` DNS server is.
- Your browser consults the `com` DNS server to find out where `mysite` is.
- Your browser consults the `mysite` DNS server to find out where `www.mysite.com` is.
- Your browser gets the IP address and makes a connection to the website.
Notice how the domain name is parsed from right to left - from the more generic to the more specific. If you think about it, this is similar to how we parse location addresses. When you needed to know where _No 4, Coffee Bean Avenue, Roasted County, Coffee Land_ is, you would parse it from right to left as
- Go to _Coffee Land_.
- Find _Roasted County_ inside _Coffee Land_.
- Find _Coffee Bean Avenue_ inside _Roasted County_.
- Find _No 4_ inside _Coffee Bean Avenue_.
Fin!