Identifying Airtel middleboxes that censor HTTPS traffic

Back in November 2019, we reported that Reliance Jio is able to block HTTPS internet traffic by means of a deep packet inspection (DPI) technique. In response, some readers messaged us saying that they ran our test and were able to reproduce similar behaviour on Airtel mobile networks. According to TRAI's Performance Indicators report for Jul-Sep 2019, Reliance Jio and Airtel serve roughly 52% and 23% of internet subscribers in India respectively. This essentially means that SNI inspection based censorship is now impacting every 3 out of 4 internet connections in India.

Although the previous test was able to detect the presence of SNI inspection based censorship, it was not very insightful. In this post, we delve into a more informative test which not only confirms the presence of SNI inspection based censorship, but also helps us identify the exact mechanism. Furthermore, it also allows us to identify middleboxes which are actively inspecting SNI in TLS handshakes and censoring requests. Using this method, we were able to discover 25 different middleboxes registered to Airtel, which are actively censoring HTTPS traffic.

Quick links to different sections of this post:
1. Transport Layer Security
1.1 Server Name Indication
2. SNI Inspection based censorship
3. Iterative Network Tracing
4. Data Preparation
5. Methodology
6. Examining Airtel's behaviour
7. References

All the code for replicating this experiment, as well as the logs from our test runs can be found in this repository. Big shout-out to IPinfo for giving us access to their IP address dataset, and Gushabad Grover for his suggestions while ideating the methodology and for editing this post.

Transport Layer Security

Transport Layer Security (TLS) is a cryptographic protocol for providing communication confidentiality and authenticity, commonly used for encrypting web traffic (as done in HTTPS). Normally TLS is used over TCP, as it requires a reliable in-order data stream. A quick refresher on TLS by Cloudflare.

img

A TCP handshake followed by a TLS Handshake. The ClientHello is a message sent by the client, which initiates the TLS handshake. This message can contain extensions such as SNI. Image credits - Cloudflare

Server Name Indication

Server Name Indication (SNI), defined first in RFC4366 and then in RFC6066, is a TLS extension designed to facilitate the hosting of multiple HTTPS websites on the same IP address. While sending a ClientHello message (which initiates the establishment of a secure connection), the client is expected to fill in the SNI attribute with the hostname of the website it wishes to connect to. SNI, unfortunately, travels on the network in cleartext, i.e. network operators can not only see the websites you’re visiting, but also filter traffic based on this information.


SNI Inspection based censorship

Since the SNI present is in cleartext, anyone in the network can inspect and filter traffic based on its value. As seen in other countries, ISPs can leverage this to deny access to certain websites. We can observe the same by attempting a TLS connection using openssl and monitoring packets to the host.

openssl s_client -state -connect 103.224.212.222:443 -servername fullhd720.com  

img

An attempted TLS connection to 103.224.212.222, with SNI fullhd720.com. We observe a RST packet immediately after the ClientHello message containing the SNI is sent.

For instance, using Airtel, we can see that the client receives a TCP RST packet when it tries to connect to a blocked website "fullhd720.com". The RST packet seems to be originating from the actual host, and is received right after the ClientHello message containing the SNI is sent. PCAP.

To confirm that the connection termination was indeed due to the SNI, we can reattempt the connection with a different SNI which we don't expect to be blocked (in this case we use facebook.com).

openssl s_client -state -connect 103.224.212.222:443 -servername facebook.com  

img

An attempted TLS connection to 103.224.212.222 with a different SNI, facebook.com. In this case, we observe a successful TLS handshake

This time we notice a successful connection, indicating that the RST in the previous attempt was indeed due to the specified SNI. PCAP.

Although this test does demonstrate the presence of SNI inspection based censorship, the packet dumps are not sufficient to prove that the RST packet was actually forged by a middlebox belonging to the ISP.


Iterative Network Tracing

For a given host, let's call the minimum Time to Live (TTL) required for a packet to reach from the client to the host, min_ttl. Any packet where the TTL set is less than min_ttl would expire in transit, and never reach the host. Ideally, the router at which the TTL of the packet expired should respond with an ICMP Time Exceeded (ICMP message type 11) message. However, this is not guaranteed, and some routers are even configured to not send them (in order to hide the topology of the network).

img

Iterative Network Tracing; we send ClientHello messages with increasing TTL. In this particular case, the minimum TTL required is 9. A middlebox which censors requests would send back a censored response even when the TTL is less than 9. Image credits - Yadav et al.

So if the RST received is forged by a middlebox, we should receive it even when we send the ClientHello message with TTL less than min_ttl. This approach, known as Iterative Network Tracing (INT), has been previously used to ascertain the presence of middleboxes which censor DNS and HTTP traffic in India [Yadav et al.] and China Xu et al. Similar to these studies, we use INT to detect censorship of TLS traffic (explained further in the methodology section).


Data preparation

We run our tests using a list of potentially blocked websites (PBWs), curated from leaked court and government orders. The list and more information pertaining to it can be found here.

Using Google's DNS over HTTPS (DoH) service, each hostname was resolved to its correct IP address. Using DoH here is important as it ensures that no DNS based censorship intervenes with the test. This resulted in roughly 5000 (hostname, ip) pairs. Next we selected a random subset and checked for TCP connectivity to port 443 to each of those ips (since not all would support HTTPS traffic), filtering our list down to 1370 pairs.

For each of these test points, we establish a TCP connection with the resolved_ip, and send a TLS ClientHello with the SNI set as the correct_hostname. We sniff and save these ClientHello packets (just the SSL layer) for use later. Similarly, we save the ClientHello packet with the SNI set as facebook.com. These sniffed packets can be found here.


Methodology

The input to the test is a 2-tuple, (correct_hostname, resolved_ip). We would like to understand the behaviour of a middlebox when it observes a ClientHello message containing an SNI for a website it wishes to block.

First, we calculate the min_ttl for a given test point. We begin by establishing a TCP connection with resolved_ip.

import socket  
import random  
from scapy.all import *

resolved_ip = "103.224.212.222"  
dport = 443 # TLS connection  
sport = random.randint(1024, 65535) # Random source port

def create_connection(resolved_ip):  
    s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW)
    s.bind(("usb0", 0)) # Was using a tethered mobile connection for the experiment

    IP_PACKET = IP(dst = resolved_ip)

    seq = random.randint(12345, 67890) # Randomise initial seq number
    SYN = TCP(sport = sport, dport = dport, flags = "S", seq = seq)
    SYNACK = sr1(IP_PACKET / SYN)
    ACK = TCP(sport = sport, dport = dport, flags = "A", seq = seq + 1, ack = SYNACK.seq + 1)
    send(IP_PACKET / ACK)
    return IP_PACKET, ACK

Note: When the linux kernel feature gets a TCP packet to an unknown socket, it sends a RST back to the originator. Since we'll be creating our own raw sockets, we need to suppress these outbound RSTs from the kernel using iptables before running experiments.

sudo iptables -A OUTPUT -p tcp --tcp-flags RST RST -j DROP  

Once the TCP connection has been established, we send ClientHello messages (containing facebook.com in SNI) after updating the TTL (probe_ttl) in the underlying IP header. We specify facebook.com in the SNI so that the middlebox doesn't attempt to terminate the connection. min_ttl would be the minimum TTL at which we receive a TLS ServerHello or TLS Alert from the host.

# Load ClientHello with garbled hostname in SNI (sniffed earlier, read Data Preparation)
max_ttl = 35

def find_min_ttl(resolved_ip)

    with open("tls_client_hellos/facebook.com", 'rb') as fp:
        tls_client_hello_facebook_com_sni = fp.read()

    for probe_ttl in range(1, max_ttl):
        IP_PACKET, ACK = create_connection(resolved_ip)
        IP_PACKET.ttl = probe_ttl
        del IP_PACKET.chksum # Will force scapy to recalculate checksum after TTL update

        resp, _ = sr(IP_PACKET / ACK / tls_client_hello_facebook_com_sni, timeout = 2, retry = 0, multi = True)

        for _, ans_packet in resp:
            tls_alert = ans_packet.get(tls.TLS, {}).get(tls.TLSAlert)
            tls_server_hello = ans_packet.get(tls.TLS, {}).get(tls.TLSHandshakes, {}).get(tls.TLSServerHello)

            if tls_alert or tls_server_hello:
                return probe_ttl # min_ttl found!

Next, we send ClientHello messages containing the correct_hostname in the SNI with TTL increasing from 1 to min_ttl - 1. If there is no middlebox interfering with the connection, all such requests should receive either an ICMP Time Exceeded in response or no response at all. If at any point we receive an RST packet which seems to be originating from resolved_ip, we can say with certainty that the packet was forged by a middlebox.

min_ttl = find_min_ttl(resolved_ip, tls_client_hello)

with open("tls_client_hellos/fullhd720.com", 'rb') as fp:  
    tls_client_hello_correct_sni = fp.read()

for probe_ttl in range(1, min_ttl):  
    IP_PACKET, ACK = create_connection(resolved_ip)
    IP_PACKET.ttl = probe_ttl
    del IP_PACKET.chksum # Will force scapy to recalculate checksum after TTL update
    resp, _ = sr(IP_PACKET / ACK / tls_client_hello_correct_sni, timeout = 2, retry = 0, multi = True)

    for _, ans_packet in resp:
        icmp_packet = ans_packet.get(ICMP)
        if icmp_packet is not None:
            print ("Found ICMP message of type %d" %(ans_packet[ICMP].type))
            continue

        tcp_packet = ans_packet.get(TCP)
        if tcp_packet and (tcp_packet.flags >> 2) % 2 == 1:
            print ("Found RST at hop %d" % (probe_ttl))

Using our methodology, we mine the following information for our each point in our test list:

  • min_ttl: Minimum TTL at which TLS ServerHello / TLS Alert received (if any) in response to a ClientHello with `facebook.com` as SNI
  • min_correct_sni_RST: Minimum TTL at which RST received (if any) in response to a ClientHello with correct_sni
  • min_correct_sni_TLS: Minimum TTL at which TLS ServerHello / TLS Alert received (if any) in response to a ClientHello with correct_sni

The script for running this test can be found here. Logs for each test run can be found here. Code for mining the information above from the logs is present in this python notebook.


Examining Airtel's behaviour

From our list of 1370 potentially blocked websites, there were 1058 instances where we received RST packets in response to ClientHellos with correct_sni. In all of these cases, the RST seemed to be originating from resolved_ip, and min_correct_sni_RST was less than min_ttl. This implies the presence of a middlebox deliberately terminating connections.

Furthermore, in 170 cases, we also received ICMP Time Exceeded alerts at the same probe_ttl at which we received RST packets. On further analysis, we found these RST packets to be originating from 25 unique middleboxes. Checking with ipinfo.io revealed that 16 of these were registered to airtel.com, and 9 were registered to bhartitelesonic.com. More information regarding these middleboxes can be found in this python notebook.

In 290 cases, we received TLS ServerHellos / TLS Alerts in response, indicating no network interference. This is expected, since we started with a list of potentially blocked websites.

Apart from the above, there were a few test failure due to connectivity issues, which we did not probe further.


References

  1. Where The Light Gets In: Analyzing Web Censorship Mechanisms in India. IMC 2018. Tarun Kumar Yadav, Akshat Sinha, Devashish Gosain, Piyush Kumar Sharma, and Sambuddho Chakravarty. PDF
  2. Internet Censorship in China: Where Does the Filtering Occur? PAM 2011. Xueyang Xu, Zhuoqing Morley Mao, and J. Alex Halderman. PDF
  3. RFC 6066; Transport Layer Security (TLS) Extensions: Extension Definitions. 2011. Donald E. Eastlake 3rd
Show Comments