How it all started – The APRANET
Imagine(if possible), a time without the world wide web. (and without the internet, if you know the difference). Consider a college (let’s call it A) in a university which has the engineering disciplines ECE & MAE (the old GBPEC) but does not have CSE. If a person from college A wants to learn about microprocessors, he can easily access notes from a computer in college A’s network. Another college B has the CSE branch. This student from A wants to learn a programming language using course material on B’s computers. So, How does he access B’s network?
If we connect the two networks of the colleges A & B, we get what is called an internetwork or internet. Imagine the internet as a simple wire that connects two nodes. in this case, the nodes are two individual networks.
Now consider creating an internet of all the colleges in the universities or perhaps, of various universities by joining their networks using a wire. The same idea was implemented by four institutions – UCSB, SRI, UTAH, & UCLA. This was the first internet – which was called the APRANET. It expanded to cover the whole globe – and formed what is called the World Wide Web. So, the world wide web is an instance of, and perhaps the largest instance of an internet. But it is not the same thing as the internet.
How it all works – The TCP/IP Model
We will continue to think of the internet (now used for the www) as a wire. And the world wide web as a big wire. Many machines are connected to this wire. These may be Computers, Mobiles & Servers. Servers are big machines that store all the websites (data+pages+audio+images+video) you visit. These are connected directly to the internet. Your desktops & latops and mobiles are connected to this wire (internet) indirectly, via an ISP (your Internet Service Provider – MTNL?)
Now, all that happens on the wire is exchange of data packets – from you to zarrata.com’s server (upload) and from zarrata.com’s server to you (download) – When you’re viewing this page, you’ve received some data packet from the server which are temporarily stored on your computer. For example, the contents of this page including the text and images.
How these packets are and created, sent & received is defined by protocols as per the TCP/IP model. There are various layers of abstraction involved in the process (Physical & Data Link, Network, Transport & Application) but as of now, we only need to worry about the Application Layer. That is, how the user is interfaced to the network. It is more about why you see what you see on the internet and not about how the internet breaks down and integrates the data packets.
And by the way, you (and zarrata.com) are not the only ones on the internet. So, how does the internet know which two machines have to be connected (ie-interchange data packets)? Two things – IP addresses & Routing. The data packets have to be addressed to a particular machine (server or your desktop).
IP Addresses, DNS & Subnets
Just like each place in the world has an address – Each machine on the internet has an address called the IP Address. For instance, your computer has an IP address, the server that hosts this website has an IP adress, and so on. And when we use “www.zarrata.com”, we are actually referring to IP address “188.8.131.52″ which is assigned to zarrata.com. You can find your own IP at http://whatismyipaddress.com/ & a website’s (server’s) ip at http://www.yougetsignal.com/tools/web-sites-on-web-server/
Then, of course, we would need a database that has all the IP’s corresponding to all the websites on the internet. It could then be used to translate the human-friendly address or domain names(zarrata.com) to the real(IP) address(184.108.40.206). This database is called the DNS (Domain Name System).
Now, an IP address is actually used in binary for communication by the computer. For instance, 220.127.116.11 translates to 11011000.00011011.00111101.10001001 – each number is represented by 8 positions in binary form – and are hence, called octets. The complete IP address is then a 32-bit number (made from 4 octets). Each of these 32 bits can be either 0 or 1 – thus allowing 2^(32) unique IP addresses which is about 4.3 billion possibilities.
[PS - The IP address considered here is the most prevalent version of the IP address - called IPv4. The other being IPv6]
Now, the first half of the IP address – the first 2 octets – 216.27 is the network part and the second half – the last 2 octects 61.137 form the machine or host part. ie-the first half is the neighborhood or street where you live and the second half is the exact house in the neighborhood. Hence, machines having the same first half of the IP belong to the same network. For instace, 18.104.22.168 & 22.214.171.124 belong to the same network.
Subnets & Subnetting
Networks having a large number of hosts are divided into subnetworks or subnets. A subnet number is assigned to each subnet. The second half or host part now has the host and the subnet number combined together – using the AND operation.
Consider a large network 192.168.10.0 that has the IP addresses ranging from 192.168.10.1 – 192.168.10.224
(225 is the maximum value of any octet – since, 256 is not an octect; 256 = 100000000. Here 224 is used since 192.168.10.255 is a BROADCAST ADDRESS and cannot be used).
Now, we wish to divide the network into two subnetworks:
The first network consisting of the machines 192.168.10.1 – 192.168.10.127
The second network consisting of the machines 192.168.10.128 – 192.168.10.225
It is done by using a subnet mask which is defined by :
1)The class of IP(see table below) – here, the mask to be used is 255.255.255.0
2)The address that divided the network – here, 192.168.10.128
Hence, the mask obtained from 1) & 2) is 255.255.255.128
|Class||Address Range||Subnet Mask|
|Class A||1.xxx.xxx.xxx – 126.xxx.xxx.xxx||255.0.0.0|
|Class B||128.xxx.xxx.xxx – 191.xxx.xxx.xxx||255.255.0.0|
|Class C||192.xxx.xxx.xxx – 223.xxx.xxx.xxx||255.225.225.0|
|Class D||224.xxx.xxx.xxx – 239.xxx.xxx.xxx||Reserved for multicast groups|
|Class E||240.xxx.xxx.xxx – 254.xxx.xxx.xxx||Reserved for future use, research & development|
[Note - 127.x.x.x addresses are reserved for loopback or localhost]
How email works – The SMTP, POP & IMAP Protocols
There are two protocols involved when you email someone (did you know or care?). One for sending the mail – called the SMTP (Simple Mail Transfer) & another for receiving the mail – could be either POP or IMAP. POP is simpler & the most common one. IMAP is hierarchy based and less common.
Suppose you send a mail to your girlfriend, this is what really happens -
Steps Involved in sending an email (when someone sends me a mail at email@example.com)
Step 1- SMTP of sender breaks the id into two parts – devrishabh & gmail.com(it is interested only in the latter)
Step 2 – SMTP of sender gets the IP of the SMTP server for gmail.com (receiver, me)
Step 3 – With this IP, it connects to the SMTP Of gmail.com using port 25 and gives it the message. (don’t worry about that)
Step 4 – Gmail uses POP3 to put the message in my inbox. Totally.
How I upload files to zarrata.com – The FTP
Now, how do people who have websites – or a server upload files to their servers? If you try searching google from a free songs using the intitle:”index of” prefix, you’ll come across a file hierarchy. This is how files are arranged on the server. When you open a website. The DNS first maps the website to it’s IP address (of the server) and it then loads the index.html or index.php file from the server which is essentially what you call the “HOME PAGE” of the website. This is the page that loads first (just like the main function is executed first in a C program, YEAH!)
All other files are somehow linked to the index.html using html code. (which is the code that http understands). your browser is like a compiler+executor of this code. It takes this code an the input and produces the website that you see as the output. You can View Source of this page to see what i’m talking about.
Now, how are all these HTML (for now) files uploaded to the server? – using the FTP or File Transmission Protocol. We (the webmasters, aha) have a ftp client wherein we need to login with our ftp details and upload files. Easy. You can try replacing http:// by ftp:// on a webpage. For instance, try ftp://zarrata.com/durofy instead of http://zarrata.com/durofy & see what you get.
Don’t worry if you don’t have the login details. here’s a sneak into my ftp client -
You can see the desktop and the remove site (the server for zarrata.com). Files can then be transferred between these two using this client.
Logging on to the Remote Computer – TELNET
If you want to do more than just transfer files, you can even log on to a computer that is located in a remote location (away from you) using your desktop. This is done using a telecommunication network or TELNET. It has now evolved to something called SSH or Secure Shell which is a combination of TELNET + Encryption for security.
How Google Works – Internet Search
First, If you think Google searches the internet, well, it just does’nt. It searches only a copy of the internet (whatever it has of it) stored in a database on it’s own server. There are three steps involved – CRAWLING + INDEXING + RANKING.
CRAWLING – Google creates these algorithms or programs called spiders (or bots or robots or crawlers) that crawl the web and look for links. They go on a page (where they’ve arrived from another page). Then they add that page to a queue for indexing. They then click on the links on that page. Then, links on the new pages & so on. ((until they reach a page where there are no new links or the new ones are already in google’s index) This allows google to index thousands of pages in one go.
INDEXING & RANKING – All the links in the queue are indexed and ranked. Ranking a page involves a number of questions (200+ according to Matt Cutts – developer of Google Social Search – the one who allowed you to put a family filter and let you search for soft porn).
Some on the questions are -
(ie-If someone search for a keyword, what websites will be indexed?)
- How many times does the page contain the keywords?
- Are these keywords in the page title?
- Is the page from a quality website (determined by Google’s Page Rank – you don’t need to worry about that)
- Does the page include synonyms for these words.
How do we make websites – HTML
Forums & Boards before the World Wide Web – USENET
Before the world wide web (& before the internet – now you know the difference) – we had newsgroups instead of forums. Just like Apranet, various colleges connected together to create a network for news exchange. This was called the USENET.
Google groups still has the most comprehensive archive of Usenet postings (back to 1981).
How USENET Works -
Step 1 – There is a newsreader that connects to a news server.
Step 2 – It downloads all new messages posted in groups you’re subscribed to.
Step 3 – When you reply, the reply is stored in the server
Step 4 – News server connects to other servers and updates the message.
Step 5 – Changes are replicated until all the servers are updated.
Forms & Programs on the internet – CGI
Are you a programmer? What if you want to write code and execute it as a web page and publish it as a website. Forms is a good example.(it is a program – a collection of inputs and outputs) This is done using CGI or Common Gateway Interface.
CGI is not a programming language itself but it is like a program that can be written in many languages. A file written in C, Perl, JAVA, etc can be run as a website by including a html header in the print function of the program and then uploading it to the server with the extention .cgi
Sample Code for CGI in C:
int main (void)
printf(“content_type: text/plan; charset=us_ascii\n”);