What really happens when you navigate to a URL

2023-10-31

As a software developer, you certainly have a high-level picture of how web apps work and what kinds of technologies are involved: the browser, HTTP, HTML, web server, request handlers, and so on.

In this article, we will take a deeper look at the sequence of events that take place when you visit a URL.

1. You enter a URL into the browser

It all starts here:

image

2. The browser looks up the IP address for the domain name

 image

The first step in the navigation is to figure out the IP address for the visited domain. The DNS lookup proceeds as follows:

  • Browser cache – The browser caches DNS records for some time. Interestingly, the OS does not tell the browser the time-to-live for each DNS record, and so the browser caches them for a fixed duration (varies between browsers, 2 – 30 minutes).
  • OS cache – If the browser cache does not contain the desired record, the browser makes a system call (gethostbyname in Windows). The OS has its own cache.
  • Router cache – The request continues on to your router, which typically has its own DNS cache.
  • ISP DNS cache – The next place checked is the cache ISP’s DNS server. With a cache, naturally.
  • Recursive search – Your ISP’s DNS server begins a recursive search, from the root nameserver, through the .com top-level nameserver, to Facebook’s nameserver. Normally, the DNS server will have names of the .com nameservers in cache, and so a hit to the root nameserver will not be necessary.

Here is a diagram of what a recursive DNS search looks like:

500px-An_example_of_theoretical_DNS_recursion_svg 

One worrying thing about DNS is that the entire domain like wikipedia.org or facebook.com seems to map to a single IP address. Fortunately, there are ways of mitigating the bottleneck:

  • Round-robin DNS is a solution where the DNS lookup returns multiple IP addresses, rather than just one. For example, facebook.com actually maps to four IP addresses.
  • Load-balancer is the piece of hardware that listens on a particular IP address and forwards the requests to other servers. Major sites will typically use expensive high-performance load balancers.
  • Geographic DNS improves scalability by mapping a domain name to different IP addresses, depending on the client’s geographic location. This is great for hosting static content so that different servers don’t have to update shared state.
  • Anycast is a routing technique where a single IP address maps to multiple physical servers. Unfortunately, anycast does not fit well with TCP and is rarely used in that scenario.

Most of the DNS servers themselves use anycast to achieve high availability and low latency of the DNS lookups.

3. The browser sends a HTTP request to the web server

image

You can be pretty sure that Facebook’s homepage will not be served from the browser cache because dynamic pages expire either very quickly or immediately (expiry date set to past).

So, the browser will send this request to the Facebook server:

GET http://facebook.com/ HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...]
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Host: facebook.com
Cookie: datr=1265876274-[...]; locale=en_US; lsd=WW[...]; c_user=2101[...]

The GET request names the URL to fetch: “http://facebook.com/”. The browser identifies itself (User-Agent header), and states what types of responses it will accept (Accept and Accept-Encodingheaders). The Connection header asks the server to keep the TCP connection open for further requests.

The request also contains the cookies that the browser has for this domain. As you probably already know, cookies are key-value pairs that track the state of a web site in between different page requests. And so the cookies store the name of the logged-in user, a secret number that was assigned to the user by the server, some of user’s settings, etc. The cookies will be stored in a text file on the client, and sent to the server with every request.

There is a variety of tools that let you view the raw HTTP requests and corresponding responses. My favorite tool for viewing the raw HTTP traffic is fiddler, but there are many other tools (e.g., FireBug) These tools are a great help when optimizing a site.

In addition to GET requests, another type of requests that you may be familiar with is a POST request, typically used to submit forms. A GET request sends its parameters via the URL (e.g.: http://robozzle.com/puzzle.aspx?id=85). A POST request sends its parameters in the request body, just under the headers.

The trailing slash in the URL “http://facebook.com/” is important. In this case, the browser can safely add the slash. For URLs of the form http://example.com/folderOrFile, the browser cannot automatically add a slash, because it is not clear whether folderOrFile is a folder or a file. In such cases, the browser will visit the URL without the slash, and the server will respond with a redirect, resulting in an unnecessary roundtrip.

4. The facebook server responds with a permanent redirect

image

This is the response that the Facebook server sent back to the browser request:

HTTP/1.1 301 Moved Permanently
Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0,
      pre-check=0
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Location: http://www.facebook.com/
P3P: CP="DSP LAW"
Pragma: no-cache
Set-Cookie: made_write_conn=deleted; expires=Thu, 12-Feb-2009 05:09:50 GMT;
      path=/; domain=.facebook.com; httponly
Content-Type: text/html; charset=utf-8
X-Cnection: close
Date: Fri, 12 Feb 2010 05:09:51 GMT
Content-Length: 0

The server responded with a 301 Moved Permanently response to tell the browser to go to “http://www.facebook.com/” instead of “http://facebook.com/”.

There are interesting reasons why the server insists on the redirect instead of immediately responding with the web page that the user wants to see.

One reason has to do with search engine rankings. See, if there are two URLs for the same page, say http://www.igoro.com/ and http://igoro.com/, search engine may consider them to be two different sites, each with fewer incoming links and thus a lower ranking. Search engines understand permanent redirects (301), and will combine the incoming links from both sources into a single ranking.

Also, multiple URLs for the same content are not cache-friendly. When a piece of content has multiple names, it will potentially appear multiple times in caches.

5. The browser follows the redirect

image

The browser now knows that “http://www.facebook.com/” is the correct URL to go to, and so it sends out another GET request:

GET http://www.facebook.com/ HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...]
Accept-Language: en-US
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Cookie: lsd=XW[...]; c_user=21[...]; x-referer=[...]
Host: www.facebook.com

The meaning of the headers is the same as for the first request.

6. The server ‘handles’ the request

image

The server will receive the GET request, process it, and send back a response.

This may seem like a straightforward task, but in fact there is a lot of interesting stuff that happens here – even on a simple site like my blog, let alone on a massively scalable site like facebook.

  • Web server software
    The web server software (e.g., IIS or Apache) receives the HTTP request and decides which request handler should be executed to handle this request. A request handler is a program (in ASP.NET, PHP, Ruby, …) that reads the request and generates the HTML for the response.

    In the simplest case, the request handlers can be stored in a file hierarchy whose structure mirrors the URL structure, and so for example http://example.com/folder1/page1.aspx URL will map to file /httpdocs/folder1/page1.aspx. The web server software can also be configured so that URLs are manually mapped to request handlers, and so the public URL of page1.aspx could be http://example.com/folder1/page1.

  • Request handler
    The request handler reads the request, its parameters, and cookies. It will read and possibly update some data stored on the server. Then, the request handler will generate a HTML response.

One interesting difficulty that every dynamic website faces is how to store data. Smaller sites will often have a single SQL database to store their data, but sites that store a large amount of data and/or have many visitors have to find a way to split the database across multiple machines. Solutions include sharding (splitting up a table across multiple databases based on the primary key), replication, and usage of simplified databases with weakened consistency semantics.

One technique to keep data updates cheap is to defer some of the work to a batch job. For example, Facebook has to update the newsfeed in a timely fashion, but the data backing the “People you may know” feature may only need to be updated nightly (my guess, I don’t actually know how they implement this feature). Batch job updates result in staleness of some less important data, but can make data updates much faster and simpler.

7. The server sends back a HTML response

image

Here is the response that the server generated and sent back:

HTTP/1.1 200 OK
Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0,
    pre-check=0
Expires: Sat, 01 Jan 2000 00:00:00 GMT
P3P: CP="DSP LAW"
Pragma: no-cache
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
X-Cnection: close
Transfer-Encoding: chunked
Date: Fri, 12 Feb 2010 09:05:55 GMT

2b3
��������T�n�@����[...]

The entire response is 36 kB, the bulk of them in the byte blob at the end that I trimmed.

The Content-Encoding header tells the browser that the response body is compressed using the gzip algorithm. After decompressing the blob, you’ll see the HTML you’d expect:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"   
      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" 
      lang="en" id="facebook" class=" no_js">
<head>
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-language" content="en" />
...

In addition to compression, headers specify whether and how to cache the page, any cookies to set (none in this response), privacy information, etc.

Notice the header that sets Content-Type to text/html. The header instructs the browser to render the response content as HTML, instead of say downloading it as a file. The browser will use the header to decide how to interpret the response, but will consider other factors as well, such as the extension of the URL.

8. The browser begins rendering the HTML

Even before the browser has received the entire HTML document, it begins rendering the website:

 image

9. The browser sends requests for objects embedded in HTML

image

As the browser renders the HTML, it will notice tags that require fetching of other URLs. The browser will send a GET request to retrieve each of these files.

Here are a few URLs that my visit to facebook.com retrieved:

  • Images
    http://static.ak.fbcdn.net/rsrc.php/z12E0/hash/8q2anwu7.gif
    http://static.ak.fbcdn.net/rsrc.php/zBS5C/hash/7hwy7at6.gif
  • CSS style sheets
    http://static.ak.fbcdn.net/rsrc.php/z448Z/hash/2plh8s4n.css
    http://static.ak.fbcdn.net/rsrc.php/zANE1/hash/cvtutcee.css
  • JavaScript files
    http://static.ak.fbcdn.net/rsrc.php/zEMOA/hash/c8yzb6ub.js
    http://static.ak.fbcdn.net/rsrc.php/z6R9L/hash/cq2lgbs8.js

Each of these URLs will go through process a similar to what the HTML page went through. So, the browser will look up the domain name in DNS, send a request to the URL, follow redirects, etc.

However, static files – unlike dynamic pages – allow the browser to cache them. Some of the files may be served up from cache, without contacting the server at all. The browser knows how long to cache a particular file because the response that returned the file contained an Expires header. Additionally, each response may also contain an ETag header that works like a version number – if the browser sees an ETag for a version of the file it already has, it can stop the transfer immediately.

Can you guess what “fbcdn.net” in the URLs stands for? A safe bet is that it means “Facebook content delivery network”. Facebook uses a content delivery network (CDN) to distribute static content – images, style sheets, and JavaScript files. So, the files will be copied to many machines across the globe.

Static content often represents the bulk of the bandwidth of a site, and can be easily replicated across a CDN. Often, sites will use a third-party CDN provider, instead of operating a CND themselves. For example, Facebook’s static files are hosted by Akamai, the largest CDN provider.

As a demonstration, when you try to ping static.ak.fbcdn.net, you will get a response from an akamai.net server. Also, interestingly, if you ping the URL a couple of times, may get responses from different servers, which demonstrates the load-balancing that happens behind the scenes.

10. The browser sends further asynchronous (AJAX) requests

image

In the spirit of Web 2.0, the client continues to communicate with the server even after the page is rendered.

For example, Facebook chat will continue to update the list of your logged in friends as they come and go. To update the list of your logged-in friends, the JavaScript executing in your browser has to send an asynchronous request to the server. The asynchronous request is a programmatically constructed GET or POST request that goes to a special URL. In the Facebook example, the client sends a POST request to http://www.facebook.com/ajax/chat/buddy_list.php to fetch the list of your friends who are online.

This pattern is sometimes referred to as “AJAX”, which stands for “Asynchronous JavaScript And XML”, even though there is no particular reason why the server has to format the response as XML. For example, Facebook returns snippets of JavaScript code in response to asynchronous requests.

Among other things, the fiddler tool lets you view the asynchronous requests sent by your browser. In fact, not only you can observe the requests passively, but you can also modify and resend them. The fact that it is this easy to “spoof” AJAX requests causes a lot of grief to developers of online games with scoreboards. (Obviously, please don’t cheat that way.)

Facebook chat provides an example of an interesting problem with AJAX: pushing data from server to client. Since HTTP is a request-response protocol, the chat server cannot push new messages to the client. Instead, the client has to poll the server every few seconds to see if any new messages arrived.

Long polling is an interesting technique to decrease the load on the server in these types of scenarios. If the server does not have any new messages when polled, it simply does not send a response back. And, if a message for this client is received within the timeout period, the server will find the outstanding request and return the message with the response.

Conclusion

Hopefully this gives you a better idea of how the different web pieces work together.






http://igoro.com/archive/what-really-happens-when-you-navigate-to-a-url/


本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

What really happens when you navigate to a URL 的相关文章

  • Floyd算法的原理和实现代码

    原理 假设有向图G V E 采用邻接矩阵存储 设置一个二维数组A用于存放当前顶点之间的最短路径长度 分量A i j 表示当前顶点i gt j的最短路径长度 然后 每次添加一个顶点 同时对A的数组进行筛选优化 期间会产生k个A数组 Ak i
  • 第一个vue程序

    div message h2 school name school moblie h2 div
  • 程序、进程、线程联系以及进程和线程的区别和联系

    程序和进程的区别与联系 程序是一组有序的指令集合是一个静态的概念 一个程序由一组指令组成 以二进制方式存在存储器中 进程是程序及其数据在计算机上的一次运行活动 是一个动态的概念 进程的运行实体是程序 离开的程序的进程没有意义 进程是由程序
  • 交互原型设计工具

    1 axure RP 适合 快速创建应用软件或Web线框图 流程图 原型和规格说明文档 优点 支持交互设计 并可生成规格说明文档和输出HTML原型 Axure RP 集 UX 原型 规范和图表于一身 2 Sketch 适合 为视觉设计师打造
  • 图数据库——大数据时代的高铁

    作者 董小珊 姚臻 责编 仲培艺 zhongpy csdn net 本文为 程序员 原创文章 未经允许不得转载 更多精彩文章请订阅 程序员 如果把传统关系型数据库比做火车的话 那么到现在大数据时代 图数据库可比做高铁 它已成为NoSQL中关
  • IDEA鼠标右击new没有class和interface的解决办法

    IDEA点击new没有class和interface 问题如下图 解决办法 1 File gt Project Structure 如下图所示 2 选择Modules gt 右边Sources中选择所需目录 然后点击 Sources gt
  • 云平台的技术

    约束记录表 简朴 勤劳 谦虚 诚恳 禁止浪费 珍惜时间 虚心学习 纯心做人 1 0 1 1 节制 静默 条理 决断 不恋吃睡 开口有益 规整事务 坚持 迅捷 0 1 1 1 正直 中庸 整洁 宁静 贞洁 敬业负责 不倚势凌人 外表整洁 不纠
  • 【解决】windows安装pycrypto出错问题。error C2061: 语法错误: 标识符“intmax_t”

    1 执行命令报错 pip install pycrypto Installing collected packages pycrypto Running setup py install for pycrypto error ERROR C
  • easyUI Tree树动态刷新子节点

    tree tree url xxx 默认是post请求 checkbox false animate true lines true loadFilter function rows 返回要显示的过滤数据 返回数据时以标准树格式返回的 也就
  • MongodbTemplate 批量更新或者修改

    批量更新或者修改 public void saveOnlineStatusList List
  • 线性反馈移位寄存器 LFSR

    参考连接 添加链接描述 运算基础 模2运算 线性反馈移位寄存器用于产生可重复的伪随机序列PRBS 该电路由n级除法器和异或门组成 在k阶段 寄存器存在初值 Rn 1 R1 R0 称为seed 在k 1阶段 寄存器的值变为 k 1阶段 Rn
  • word2010或以上版本编号变成黑块的正确处理方

    打开编号显示为黑块的文档 把光标放置在黑块的后面 然后在键盘上按左方向键 则黑块变灰色 为选中状态 2 然后按下ctrl shift s 出现应用样式窗口点击 重新应用 黑块显示成正常的编号 3 然后点击 多级列表 按钮 选择 定义新的多级
  • 一次数据库的选型,FireBird胜出

    做了n多年的J2EE应用以后 如何做客户端的BI确实让我一下子摸不到门路 近期的一个客户要求我们给他做基于客户端的BI分析 客户是对外提供重要数据的单位 有很多的客户每年购买他的数据 可以说人家的数据库 每行每列都是钱 在这种情况下 他们非
  • css实现文字环绕图片布局

    前言 css实现文字环绕图片的效果 实现效果 实现代码 通过图片属性 align div style width 400px img src d303 paixin com thumbs 3548553 231637502 staff 10
  • 数据结构——AVL树

    目录 1 什么是AVL树 2 AVL树插入的模拟实现 节点定义 插入 旋转 右单旋 左单旋 双旋 右左旋 双旋 左右旋 完整的插入代码 3 AVL树的性能分析 1 什么是AVL树 AVL树是一种自平衡二叉查找树 也被称为高度平衡树 它具有以
  • 小福利,数据可视化之常见图形的绘制

    大家好 我是天空之城 今天带来小福利 数据可视化之常见图形的绘制 读取 本 专 科 群体的数据 college student data pd read csv 工作 college student data csv encoding ut

随机推荐

  • opencv提取图像中的颜色直方图(RGB、HSV)

    本篇博客主要介绍利用opencv工具提取一幅图像中的颜色直方图特征 所谓颜色直方图 指的是一幅图像中的颜色分布 与图像中的特定的物体无关 只是用来表示人的眼睛观察到的图像中的颜色分布情况 例如说 一幅图中红色占了多少比例 绿色占了多少比例等
  • 模型旋转 触摸屏 手指滑动360度旋转 安卓版本 EasyTouch

    using UnityEngine using System Collections using System Collections Generic using DG Tweening using UnityEngine EventSys
  • 4.2.3 积分法(二)——分部积分法

    emmmm想想词 算了想不出来 既然不定积分和导数是反操作 那就从导数开始说吧 先看一个导数公式 就不解释变形过程了 上图其实就是分部积分法的计算过程 总之是分成两个步骤 先分部再积分 至于 C等到完全积分积出来之后再加 目前我们总结过的不
  • 深入 Python 3

    深入 Python 3 http dipyzh bitbucket org table of contents html xml 目录 深入 Python 3 中有何新内容 又名 负号层 安装 Python 深入 哪个版本的 Python
  • ajax前后端交互示例

    文章目录 一 前后端交互方法 1 Ajax前端示例 1 1 特点 1 2 ajax同域请求示例 1 3 ajax跨域请求示例 2 后端示例 2 1 controller层处理 一 前后端交互方法 1 Ajax前端示例 1 1 特点 Ajax
  • leetcode92 反转链表II

    题目 给你单链表的头指针 head 和两个整数 left 和 right 其中 left lt right 请你反转从位置 left 到位置 right 的链表节点 返回 反转后的链表 示例 输入 head 1 2 3 4 5 left 2
  • css动画(四)

    推荐动画四 html代码上传 div class night div class shooting star div div class shooting star div div class shooting star div div c
  • Swagger整体整理一下蛤

    最近在学习springboot时候发现好多开源的项目里面都提到了swagger 原来是一个前后端分离开发过程中为了防止两只团队为了需求更改打架 毕竟前端需要加一个参数 后端就要改好多好多 多的不说 直接上货 1 导入依赖 首先是导入依赖 既
  • Unity的Application.Quit()方法使用失效的其他解决方案。

    1 android手机上 使用方法 Application Quit 之后 游戏的进程还在 解决方法编写java代码 打成jar包或aar放到Assets Plugins Android libs下 public void KillProc
  • 全球互联网未来发展九大趋势

    当今世界网络信息技术日新月异 互联网正在全面融入经济社会生产和生活各个领域 引领了社会生产新变革 创造了人类生活新空间 带来了国家治理新挑战 并深刻地改变着全球产业 经济 利益 安全等格局 互联网正在成为21世纪影响和加速人类历史发展进程的
  • STL模板(一)向量、栈和队列

    一 vector 向量 1 定义 向量类型可以容纳许多类型的数据 被称为容器 可以当数组使用 可以随时增加或减少元素 内存连续 顺序表 2 头文件 include
  • unity, Animation crossfade需要两动画在时间上确实有交叠

    unity现在播动画都用Animator了 但公司的老项用的还是Animation 今天遇到一个bug 是两个动画的衔接处不连贯 最后发现是由于A动画已经播完之后B动画才开始播 而且还用了crossfade 0 2 正确的用法是在A动画还差
  • layui框架学习(30:树形模块)

    Layui中的树形组件模块tree用于以树形形式显示上下级结构的数据 类似于winform中的tree控件 tree模块的基本用法及显示效果如下所示 div div br
  • 2020年高教社杯全国大学生数学建模竞赛C题 第二问详细解答+代码

    2020年高教社杯全国大学生数学建模竞赛C题 第二问详细解答 代码 本文摘自小编自己的参赛论文与经历 小编获得了2020年高教社杯国奖 有问题的同学们可私聊博主哦 问题2 缺少信誉评级后的 信贷策略 研究 1 1 问题分析 问题二与问题一的
  • go语言检测字符串是否是回文

    输入的字符串分为三种情况 本身就是回文字符串 如 aba bddb vvvvv和vvvv等 本身不是回文字符串 但是可以通过删除一个字符成为回文字符串 如 abca deeee eddze和aydmda等 本身不是回文字符串 不能通过仅删除
  • vmware 不能自适应客户机的解决办法

    本文适用于 安装了vmware tools 后 立即适应客户机 仍为灰色 不能自适应的情况 环境 物理机 WIN10 vmware64 ubuntu16 04 sudo apt get install open vm tools open
  • 解决IDEA一个突发异常

    解决Module learn production java lang ClassCastException class org jetbrains jps builders java dependencyView TypeRepr Pri
  • 挣值常用的计算公式

    原文 http www cnitpm com pm 7707 html http blog sina com cn s blog 4be9c1450100vccg html 挣值管理法中的PV EV AC SV CV SPI CPI这些英文
  • Redis存储List类型数据

    Redis存储支持的类型没有object 虽然有支持list 但是它只支持list
  • What really happens when you navigate to a URL

    As a software developer you certainly have a high level picture of how web apps work and what kinds of technologies are