Https Transparent Proxy

From Bobs Projects
Jump to: navigation, search

Transparently proxying HTTPS is somewhat problematic in that the client will attempt to open a direct connection to the Secure Server and may not identify the intended server with in it's opening SSL certificate negotiation.

In this case, we need to somehow determine the servers IP address, which may have been DNAT'd etc.

Note that proxying HTTPS can not be totally "transparent", unless the proxy operator is able to perform a "compelled certificate signing attack" using a compliant Root Certificate Authority.

Contents

Background

A HTTPS client (eg. a web browser) will start off loading a web page over HTTPS by examining the Uniform Resource Locator (URL) and extracting the Domain Name of the HTTPS server. From here:

  1. client asks external DNS to resolve Domain Name into an IP (v4 or v6) address
  2. DNS responds with an IP address
  3. client opens a TCP connection (3-way handshake), typically to port 443, of the server at that IP address
  4. server participates in opening TCP connection
  5. client requests servers SSL certificate
  6. server provides certificate
  7. client checks that certificate is signed by a recognised (to the client) certificate signing authority
  8. if OK, client and server exchange session keys
  9. client sends SSL-encrypted HTTP request to server
  10. server responds with SSL-encrypted HTTP response

Note that steps 1 and 2 can be skipped if the client has cached the response to a previous DNS lookup for the same Domain Name.

Note that steps 3 - 8 can be skipped if the client has kept the SSL connection open from a previous request. HTTP v1.1 allows multiple requests down the same connection to the same server.

Implications

Once the HTTPS client opens a TCP connection to the HTTPS server, there is no further "plain-text" information as to which server the request is to be sent.

Therefore, using Destination Network Address Translation (DNAT), which changes the destination address in a non-reversible way, is not possible. This is a deliberate and necessary feature of the secure protocol.

So, an HTTPS Proxy needs to be listening for TCP connections on a separate IP (v4 or v6) address for each actual HTTPS server it will end up proxying.

This can be done in one of two ways:

  1. use the actual public IP addresses of the respective HTTPS server(s) and redirect TCP port 443 to some virtual interface, or
  2. allocate new IP addresses, chosen from, for example, a private (RFC1918) range and use DNS to map these back to the HTTPS client

Both options have positives and negatives.

So, we need to be listening to TCP port 443 on a (potentially large) number of IP addresses, which are unknown a-priori. The Linux bind/listen socket mechanism does not allow us to bind to a range of IP addresses, so we need to either create many IP aliases for the interface(s) we have, or we need to implement our own TCP stack in user-space.

What may be required here is a Userspace_Network_Stack, implementing TCP in user-space.

Retaining public IP addresses

To implement the HTTPS Proxy using the real (public) IP address for the HTTPS request requires routing all TCP/IP port 443 packets to the proxy. Alas, in Linux, the IP routing code is not sophisticated enough to allow policy routes based on TCP port number. Similarly, the venerable iptables code, which can differentiate TCP port 443 traffic, is not able to route it to a particular interface, but it can send it over a netfilter-queue.

So there are two options:

  • send all IP traffic to user-space and deal with it there
  • use netfilter-queues to pass all TCP port 443 packets into user-space

The latter is probably simpler to realise, although there are a number of issues still to resolve.

Mapping HTTPS servers to new IP addresses

Alternatively, if we can intercept the DNS requests from the HTTPS client, then we can map real (public) IP addresses to dynamically allocated private (RFC1918) ones. Alternatively (and for testing) we can simply add static IP addresses in the /etc/hosts file.

We now know that all HTTPS requests from the browser will come in on a known IP range and we can set up a route to route them all to a /dev/tun IP tunnel virtual device, which leaves all the other real/public IP routing alone.

Implementation

The proxy is implemented as a multi-threaded network application.

Threads

IP switcher

The first thread will receive all incoming packets, either from the netfilter-queue, or from the /dev/tun device. It will examine to see if the incoming packet is of a supported protocol, determine the Transport layer for the packet and queue it on the relevant transport layer queue.

UDP in

UDP in listens to the UDP queue and processes incoming UDP packets, determining the destination port etc.

TCP in