Content delivery network

A content delivery network or content distribution network (CDN) is a geographically distributed network of proxy servers and their data centers. The goal is to provide high availability and performance ("speed") by distributing the service spatially relative to end users. CDNs came into existence in the late 1990s as a means for alleviating the performance bottlenecks of the Internet[1][2] as the Internet was starting to become a mission-critical medium for people and enterprises. Since then, CDNs have grown to serve a large portion of the Internet content today, including web objects (text, graphics and scripts), downloadable objects (media files, software, documents), applications (e-commerce, portals), live streaming media, on-demand streaming media, and social media sites.[3]

(Left) Single server distribution
(Right) CDN scheme of distribution

CDNs are a layer in the internet ecosystem. Content owners such as media companies and e-commerce vendors pay CDN operators to deliver their content to their end users. In turn, a CDN pays Internet service providers (ISPs), carriers, and network operators for hosting its servers in their data centers.

CDN is an umbrella term spanning different types of content delivery services: video streaming, software downloads, web and mobile content acceleration, licensed/managed CDN, transparent caching, and services to measure CDN performance, load balancing, Multi CDN switching and analytics and cloud intelligence. CDN vendors may cross over into other industries like security, DDoS protection and web application firewalls (WAF), and WAN optimization.

Notable content delivery service providers include Akamai Technologies, Edgio, Cloudflare, Amazon CloudFront, Fastly, and Google Cloud CDN.

Technology

edit

CDN nodes are usually deployed in multiple locations, often over multiple Internet backbones. Benefits include reducing bandwidth costs, improving page load times, and increasing the global availability of content. The number of nodes and servers making up a CDN varies, depending on the architecture, some reaching thousands of nodes with tens of thousands of servers on many remote points of presence (PoPs). Others build a global network and have a small number of geographical PoPs.[4]

Requests for content are typically algorithmically directed to nodes that are optimal in some way. When optimizing for performance, locations that are best for serving content to the user may be chosen. This may be measured by choosing locations that are the fewest hops, the lowest number of network seconds away from the requesting client, or the highest availability in terms of server performance (both current and historical), to optimize delivery across local networks. When optimizing for cost, locations that are the least expensive may be chosen instead. In an optimal scenario, these two goals tend to align, as edge servers that are close to the end user at the edge of the network may have an advantage in performance or cost.

Most CDN providers will provide their services over a varying, defined, set of PoPs, depending on the coverage desired, such as United States, International or Global, Asia-Pacific, etc. These sets of PoPs can be called "edges", "edge nodes", "edge servers", or "edge networks" as they would be the closest edge of CDN assets to the end user.[5]

Security and privacy

edit

CDN providers profit either from direct fees paid by content providers using their network, or profit from the user analytics and tracking data collected as their scripts are being loaded onto customers' websites inside their browser origin. As such these services are being pointed out as potential privacy intrusions for the purpose of behavioral targeting[6] and solutions are being created to restore single-origin serving and caching of resources.[7]

In particular, a website using a CDN may violate the EU's General Data Protection Regulation (GDPR). For example, in 2021 a German court forbade the use of a CDN on a university website, because this caused the transmission of the user's IP address to the CDN, which violated the GDPR.[8]

CDNs serving JavaScript have also been targeted as a way to inject malicious content into pages using them. Subresource Integrity mechanism was created in response to ensure that the page loads a script whose content is known and constrained to a hash referenced by the website author.[9]

Content networking techniques

edit

The Internet was designed according to the end-to-end principle.[10] This principle keeps the core network relatively simple and moves the intelligence as much as possible to the network end-points: the hosts and clients. As a result, the core network is specialized, simplified, and optimized to only forward data packets.

Content Delivery Networks augment the end-to-end transport network by distributing on it a variety of intelligent applications employing techniques designed to optimize content delivery. The resulting tightly integrated overlay uses web caching, server-load balancing, request routing, and content services.[11]

Web caches store popular content on servers that have the greatest demand for the content requested. These shared network appliances reduce bandwidth requirements, reduce server load, and improve the client response times for content stored in the cache. Web caches are populated based on requests from users (pull caching) or based on preloaded content disseminated from content servers (push caching).[12]

Server-load balancing uses one or more techniques including service-based (global load balancing) or hardware-based (i.e. layer 4–7 switches, also known as a web switch, content switch, or multilayer switch) to share traffic among a number of servers or web caches. Here the switch is assigned a single virtual IP address. Traffic arriving at the switch is then directed to one of the real web servers attached to the switch. This has the advantage of balancing load, increasing total capacity, improving scalability, and providing increased reliability by redistributing the load of a failed web server and providing server health checks.

A content cluster or service node can be formed using a layer 4–7 switch to balance load across a number of servers or a number of web caches within the network.

Request routing directs client requests to the content source best able to serve the request. This may involve directing a client request to the service node that is closest to the client, or to the one with the most capacity. A variety of algorithms are used to route the request. These include Global Server Load Balancing, DNS-based request routing, Dynamic metafile generation, HTML rewriting,[13] and anycasting.[14] Proximity—choosing the closest service node—is estimated using a variety of techniques including reactive probing, proactive probing, and connection monitoring.[11]

CDNs use a variety of methods of content delivery including, but not limited to, manual asset copying, active web caches, and global hardware load balancers.

Content service protocols

edit

Several protocol suites are designed to provide access to a wide variety of content services distributed throughout a content network. The Internet Content Adaptation Protocol (ICAP) was developed in the late 1990s[15][16] to provide an open standard for connecting application servers. A more recently defined and robust solution is provided by the Open Pluggable Edge Services (OPES) protocol.[17] This architecture defines OPES service applications that can reside on the OPES processor itself or be executed remotely on a Callout Server. Edge Side Includes or ESI is a small markup language for edge-level dynamic web content assembly. It is fairly common for websites to have generated content. It could be because of changing content like catalogs or forums, or because of the personalization. This creates a problem for caching systems. To overcome this problem, a group of companies created ESI.

Peer-to-peer CDNs

edit

In peer-to-peer (P2P) content-delivery networks, clients provide resources as well as use them. This means that, unlike client–server systems, the content-centric networks can actually perform better as more users begin to access the content (especially with protocols such as Bittorrent that require users to share). This property is one of the major advantages of using P2P networks because it makes the setup and running costs very small for the original content distributor.[18][19]

Private CDNs

edit

If content owners are not satisfied with the options or costs of a commercial CDN service, they can create their own CDN. This is called a private CDN. A private CDN consists of PoPs (points of presence) that are only serving content for their owner. These PoPs can be caching servers,[20] reverse proxies or application delivery controllers.[21] It can be as simple as two caching servers,[20] or large enough to serve petabytes of content.[22]

Large content distribution networks may even build and set up their own private network to distribute copies of content across cache locations.[23][24] Such private networks are usually used in conjunction with public networks as a backup option in case the capacity of the private network is not enough or there is a failure which leads to capacity reduction. Since the same content has to be distributed across many locations, a variety of multicasting techniques may be used to reduce bandwidth consumption. Over private networks, it has also been proposed to select multicast trees according to network load conditions to more efficiently utilize available network capacity.[25][26]

edit

Emergence of telco CDNs

edit

The rapid growth of streaming video traffic[27] uses large capital expenditures by broadband providers[28] in order to meet this demand and retain subscribers by delivering a sufficiently good quality of experience.

To address this, telecommunications service providers have begun to launch their own content delivery networks as a means to lessen the demands on the network backbone and reduce infrastructure investments.

Telco CDN advantages

edit

Because they own the networks over which video content is transmitted, telco CDNs have advantages over traditional CDNs. They own the last mile and can deliver content closer to the end-user because it can be cached deep in their networks. This deep caching minimizes the distance that video data travels over the general Internet and delivers it more quickly and reliably.

Telco CDNs also have a built-in cost advantage since traditional CDNs must lease bandwidth from them and build the operator's margin into their own cost model. In addition, by operating their own content delivery infrastructure, telco operators have better control over the utilization of their resources. Content management operations performed by CDNs are usually applied without (or with very limited) information about the network (e.g., topology, utilization etc.) of the telco-operators with which they interact or have business relationships. These pose a number of challenges for the telco-operators who have a limited sphere of action in face of the impact of these operations on the utilization of their resources.

In contrast, the deployment of telco-CDNs allows operators to implement their own content management operations,[29][30] which enables them to have a better control over the utilization of their resources and, as such, provide better quality of service and experience to their end users.

Federated CDNs and Open Caching

edit

In June 2011, StreamingMedia.com reported that a group of TSPs had founded an Operator Carrier Exchange (OCX)[31] to interconnect their networks and compete more directly against large traditional CDNs like Akamai and Limelight Networks, which have extensive PoPs worldwide. This way, telcos are building a Federated CDN offering, which is more interesting for a content provider willing to deliver its content to the aggregated audience of this federation.

It is likely that in a near future, other telco CDN federations will be created. They will grow by enrollment of new telcos joining the federation and bringing network presence and their Internet subscriber bases to the existing ones.[citation needed]

The Open Caching specification by Streaming Media Alliance defines a set of APIs that allows a Content Provider to deliver its content using several CDNs in a consistent way, seeing each CDN provider the same way through these APIs.

Improving CDN performance using Extension Mechanisms for DNS

edit
 
The latency (RTT) experienced by clients with non-local resolvers ("high") reduced drastically when a CDN rolled-out the EDNS0 extension in April 2014, while the latency of clients with local resolvers are unimpacted by the change ("low").[32]

Traditionally, CDNs have used the IP of the client's recursive DNS resolver to geo-locate the client. While this is a sound approach in many situations, this leads to poor client performance if the client uses a non-local recursive DNS resolver that is far away. For instance, a CDN may route requests from a client in India to its edge server in Singapore, if that client uses a public DNS resolver in Singapore, causing poor performance for that client. Indeed, a recent study[32] showed that in many countries where public DNS resolvers are in popular use, the median distance between the clients and their recursive DNS resolvers can be as high as a thousand miles. In August 2011, a global consortium of leading Internet service providers led by Google announced their official implementation of the edns-client-subnet IETF Internet Draft,[33] which is intended to accurately localize DNS resolution responses. The initiative involves a limited number of leading DNS service providers, such as Google Public DNS,[34] and CDN service providers as well. With the edns-client-subnet EDNS0 option, CDNs can now utilize the IP address of the requesting client's subnet when resolving DNS requests. This approach, called end-user mapping,[32] has been adopted by CDNs and it has been shown to drastically reduce the round-trip latencies and improve performance for clients who use public DNS or other non-local resolvers. However, the use of EDNS0 also has drawbacks as it decreases the effectiveness of caching resolutions at the recursive resolvers,[32] increases the total DNS resolution traffic,[32] and raises a privacy concern of exposing the client's subnet.

Virtual CDN (vCDN)

edit

Virtualization technologies are being used to deploy virtual CDNs (vCDNs) with the goal to reduce content provider costs, and at the same time, increase elasticity and decrease service delay. With vCDNs, it is possible to avoid traditional CDN limitations, such as performance, reliability and availability since virtual caches are deployed dynamically (as virtual machines or containers) in physical servers distributed across the provider's geographical coverage. As the virtual cache placement is based on both the content type and server or end-user geographic location, the vCDNs have a significant impact on service delivery and network congestion.[35][36][37][38]

Image Optimization and Delivery (Image CDNs)

edit

In 2017, Addy Osmani of Google started referring to software solutions that could integrate naturally with the Responsive Web Design paradigm (with particular reference to the <picture> element) as Image CDNs.[39] The expression referred to the ability of a web architecture to serve multiple versions of the same image through HTTP, depending on the properties of the browser requesting it, as determined by either the browser or the server-side logic. The purpose of Image CDNs was, in Google's vision, to serve high-quality images (or, better, images perceived as high-quality by the human eye) while preserving download speed, thus contributing to a great User experience (UX).[citation needed]

Arguably, the Image CDN term was originally a misnomer, as neither Cloudinary nor Imgix (the examples quoted by Google in the 2017 guide by Addy Osmani) were, at the time, a CDN in the classical sense of the term.[39] Shortly afterwards, though, several companies offered solutions that allowed developers to serve different versions of their graphical assets according to several strategies. Many of these solutions were built on top of traditional CDNs, such as Akamai, CloudFront, Fastly, Edgecast and Cloudflare. At the same time, other solutions that already provided an image multi-serving service joined the Image CDN definition by either offering CDN functionality natively (ImageEngine)[40] or integrating with one of the existing CDNs (Cloudinary/Akamai, Imgix/Fastly).

While providing a universally agreed-on definition of what an Image CDN is may not be possible, generally speaking, an Image CDN supports the following three components:[41]

  • A Content Delivery Network (CDN) for the fast serving of images.
  • Image manipulation and optimization, either on-the-fly through URL directives, in batch mode (through manual upload of images) or fully automatic (or a combination of these).
  • Device Detection (also known as Device Intelligence), i.e. the ability to determine the properties of the requesting browser and/or device through analysis of the User-Agent string, HTTP Accept headers, Client-Hints or JavaScript.[41]

The following table summarizes the current situation with the main software CDNs in this space:[42]

Main Image CDNs on the market
Name CDN Image Optimization Device Detection
Akamai ImageManager Y Batch mode based on HTTP Accept header
Cloudflare Polish Y fully-automatic based on HTTP Accept header
Cloudinary Through Akamai Batch, URL directives Accept header, Client-Hints
Fastly IO Y URL directives based on HTTP Accept header
ImageEngine Y fully-automatic WURFL, Client-Hints, Accept header
Imgix Through Fastly fully-automatic Accept header / Client-Hints
PageCDN Y URL directives based on HTTP Accept header
Tinify CDN Multiple fully-automatic based on HTTP Accept header

Notable content delivery service providers

edit

Free

edit

Traditional commercial

edit

Telco CDNs

edit

Commercial using P2P for delivery

edit

Multi

edit

In-house

edit

See also

edit

References

edit
  1. ^ "Globally Distributed Content Delivery, by J. Dilley, B. Maggs, J. Parikh, H. Prokop, R. Sitaraman and B. Weihl, IEEE Internet Computing, Volume 6, Issue 5, November 2002" (PDF). Archived (PDF) from the original on 2017-08-09. Retrieved 2019-10-25.
  2. ^ Nygren., E.; Sitaraman R. K.; Sun, J. (2010). "The Akamai Network: A Platform for High-Performance Internet Applications" (PDF). ACM SIGOPS Operating Systems Review. 44 (3): 2–19. doi:10.1145/1842733.1842736. S2CID 207181702. Archived (PDF) from the original on September 13, 2012. Retrieved November 19, 2012.
  3. ^ Evi, Nemeth (2018). "Chapter 19, Web hosting, Content delivery networks". UNIX and Linux system administration handbook (Fifth ed.). Boston: Pearson Education. p. 690. ISBN 9780134277554. OCLC 1005898086.
  4. ^ "How Content Delivery Networks Work". CDNetworks. Archived from the original on 5 September 2015. Retrieved 22 September 2015.
  5. ^ "How Content Delivery Networks (CDNs) Work". NCZOnline. 29 November 2011. Archived from the original on 1 December 2011. Retrieved 22 September 2015.
  6. ^ Security, Help Net (2014-08-27). "470 million sites exist for 24 hours, 22% are malicious". Help Net Security. Archived from the original on 2019-07-01. Retrieved 2019-07-01.
  7. ^ "Decentraleyes: Block CDN Tracking". Collin M. Barrett. 2016-02-03. Archived from the original on 2019-07-01. Retrieved 2019-07-01.
  8. ^ "VG Wiesbaden verbietet die Nutzung von Content Delivery Networks". www.taylorwessing.com (in German). 2021-12-14. Retrieved 2023-03-02.
  9. ^ "Subresource Integrity". MDN Web Docs. Archived from the original on 2019-06-26. Retrieved 2019-07-01.
  10. ^ J. H. Saltzer; D. P. Reed; D. D. Clark (1 November 1984). "End-to-end arguments in system design" (PDF). ACM Transactions on Computer Systems. 2 (4): 277–288. doi:10.1145/357401.357402. ISSN 0734-2071. S2CID 215746877. Wikidata Q56503280. Retrieved 2006-11-11.
  11. ^ a b Hofmann, Markus; Beaumont, Leland R. (2005). Content Networking: Architecture, Protocols, and Practice. Morgan Kaufmann Publisher. ISBN 1-55860-834-6.
  12. ^ Bestavros, Azer (March 1996). "Speculative Data Dissemination and Service to Reduce Server Load, Network Traffic and Service Time for Distributed Information Systems" (PDF). Proceedings of ICDE'96: The 1996 International Conference on Data Engineering. 1996: 180–189. Archived (PDF) from the original on 2010-07-03. Retrieved 2017-05-28.
  13. ^ RFC 3568 Barbir, A., Cain, B., Nair, R., Spatscheck, O.: "Known Content Network (CN) Request-Routing Mechanisms," July 2003
  14. ^ RFC 1546 Partridge, C., Mendez, T., Milliken, W.: "Host Anycasting Services," November 1993.
  15. ^ RFC 3507 Elson, J., Cerpa, A.: "Internet Content Adaptation Protocol (ICAP)," April 2003.
  16. ^ ICAP Forum
  17. ^ RFC 3835 Barbir, A., Penno, R., Chen, R., Hofmann, M., and Orman, H.: "An Architecture for Open Pluggable Edge Services (OPES)," August 2004.
  18. ^ Li, Jin (2008). "On peer-to-peer (P2P) content delivery" (PDF). Peer-to-Peer Networking and Applications. 1 (1): 45–63. doi:10.1007/s12083-007-0003-1. S2CID 16438304. Archived (PDF) from the original on 2013-10-04. Retrieved 2013-08-11.
  19. ^ Stutzbach, Daniel; et al. (2005). "The scalability of swarming peer-to-peer content delivery" (PDF). In Boutaba, Raouf; et al. (eds.). NETWORKING 2005 -- Networking Technologies, Services, and Protocols; Performance of Computer and Communication Networks; Mobile and Wireless Communications Systems. Springer. pp. 15–26. ISBN 978-3-540-25809-4.
  20. ^ a b "How to build your own CDN using BIND, GeoIP, Nginx, Varnish - UNIXy". 2010-07-18. Archived from the original on 2010-07-21. Retrieved 2014-10-15.
  21. ^ "How to Create Your Content Delivery Network With aiScaler". Archived from the original on 2014-10-06. Retrieved 2014-10-15.
  22. ^ "Netflix Shifts Traffic To Its Own CDN; Akamai, Limelight Shrs Hit". Forbes. 5 June 2012. Archived from the original on 19 October 2017. Retrieved 26 August 2017.
  23. ^ Mikel Jimenez; et al. (May 1, 2017). "Building Express Backbone: Facebook's new long-haul network". Archived from the original on October 24, 2017. Retrieved October 27, 2017.
  24. ^ "Inter-Datacenter WAN with centralized TE using SDN and OpenFlow" (PDF). 2012. Archived (PDF) from the original on October 28, 2017. Retrieved October 27, 2017.
  25. ^ M. Noormohammadpour; et al. (July 10, 2017). "DCCast: Efficient Point to Multipoint Transfers Across Datacenters". USENIX. Retrieved July 26, 2017.
  26. ^ M. Noormohammadpour; et al. (2018). "QuickCast: Fast and Efficient Inter-Datacenter Transfers using Forwarding Tree Cohorts". Retrieved January 23, 2018.
  27. ^ "Online Video Sees Tremendous Growth, Spurs some Major Updates". SiliconANGLE. 2011-03-03. Archived from the original on 2011-08-30. Retrieved 2011-07-22.
  28. ^ "Overall Telecom CAPEX to Rise in 2011 Due to Video, 3G, LTE Investments". cellular-news. Archived from the original on 2011-03-25. Retrieved 2011-07-22.
  29. ^ D. Tuncer, M. Charalambides, R. Landa, G. Pavlou, "More Control Over Network Resources: an ISP Caching Perspective", proceedings of IEEE/IFIP Conference on Network and Service Management (CNSM), Zurich, Switzerland, October 2013.
  30. ^ M. Claeys, D. Tuncer, J. Famaey, M. Charalambides, S. Latre, F. De Turck, G. Pavlou, "Proactive Multi-tenant Cache Management for Virtualized ISP Networks", proceedings of IEEE/IFIP Conference on Network and Service Management (CNSM), Rio de Janeiro, Brazil, November 2014.
  31. ^ "Telcos and Carriers Forming New Federated CDN Group Called OCX (Operator Carrier Exchange)". Dan Rayburn – StreamingMediaBlog.com. 2017-12-13. Archived from the original on 2011-07-20. Retrieved 2011-07-22.
  32. ^ a b c d e "End-User Mapping: Next Generation Request Routing for Content Delivery, by F. Chen, R. Sitaraman, and M. Torres, ACM SIGCOMM conference, Aug 2015" (PDF). Archived (PDF) from the original on 2017-08-12. Retrieved 2019-10-31.
  33. ^ "Client Subnet in DNS Requests".
  34. ^ "Where are your servers currently located?". Archived from the original on 2013-01-15.
  35. ^ Filelis-Papadopoulos, Christos K.; Giannoutakis, Konstantinos M.; Gravvanis, George A.; Endo, Patricia Takako; Tzovaras, Dimitrios; Svorobej, Sergej; Lynn, Theo (2019-04-01). "Simulating large vCDN networks: A parallel approach". Simulation Modelling Practice and Theory. 92: 100–114. doi:10.1016/j.simpat.2019.01.001. ISSN 1569-190X. S2CID 67752426.
  36. ^ Filelis-Papadopoulos, Christos K.; Endo, Patricia Takako; Bendechache, Malika; Svorobej, Sergej; Giannoutakis, Konstantinos M.; Gravvanis, George A.; Tzovaras, Dimitrios; Byrne, James; Lynn, Theo (2020-01-01). "Towards simulation and optimization of cache placement on large virtual content distribution networks". Journal of Computational Science. 39: 101052. doi:10.1016/j.jocs.2019.101052. ISSN 1877-7503.
  37. ^ Ibn-Khedher, Hatem; Abd-Elrahman, Emad; Kamal, Ahmed E.; Afifi, Hossam (2017-06-19). "OPAC: An optimal placement algorithm for virtual CDN". Computer Networks. 120: 12–27. doi:10.1016/j.comnet.2017.04.009. ISSN 1389-1286.
  38. ^ Khedher, Hatem; Abd-Elrahman, Emad; Afifi, Hossam; Marot, Michel (October 2017). "Optimal and Cost Efficient Algorithm for Virtual CDN Orchestration". 2017 IEEE 42nd Conference on Local Computer Networks (LCN). Singapore: IEEE. pp. 61–69. doi:10.1109/LCN.2017.115. ISBN 978-1-5090-6523-3. S2CID 44243386.
  39. ^ a b Osmani, Addy. "Essential Image Optimization". Retrieved May 13, 2020.
  40. ^ Jon Arne Sæterås (26 April 2017). "Let The Content Delivery Network Optimize Your Images". Retrieved May 13, 2020.
  41. ^ a b Katie Hempenius. "Use image CDNs to optimize images". Retrieved May 13, 2020.
  42. ^ Maximiliano Firtman (18 September 2019). "Faster Paint Metrics with Responsive Image Optimization CDNs". Retrieved May 13, 2020.
  43. ^ "Top 4 CDN services for hosting open source libraries | opensource.com". opensource.com. Archived from the original on 18 April 2019. Retrieved 18 April 2019.
  44. ^ "Usage Statistics and Market Share of JavaScript Content Delivery Networks for Websites". W3Techs. Archived from the original on 12 April 2019. Retrieved 17 April 2019.
  45. ^ a b c d "How CDN and International Servers Networking Facilitate Globalization". The Huffington Post. Delarno Delvix. 2016-09-06. Archived from the original on 19 September 2016. Retrieved 9 September 2016.
  46. ^ "Cloud Content Delivery Network (CDN) Market Investigation Report". 2019-10-05. Archived from the original on 2019-10-07. Retrieved 2019-10-07.
  47. ^ "CDN: Was Sie über Content Delivery Networks wissen müssen". www.computerwoche.de. Archived from the original on 2019-03-21. Retrieved 2019-03-21.
  48. ^ Williams 2017-08-22T18:00:09.233ZUtilities, Mike (22 August 2017). "Warpcache review". TechRadar. Archived from the original on 2019-03-21. Retrieved 2019-03-21.{{cite web}}: CS1 maint: numeric names: authors list (link)
  49. ^ How Netflix works: the (hugely simplified) complex stuff that happens every time you hit Play

Further reading

edit