Tuesday 11 January 2022

Arduino HTTP client connection through corporate proxy server

Connecting an Arduino to the web via a http proxy

http Proxies

There are a variety of reasons why you'd want to have a http-Proxy in a corporate environment. Security, content filtering, authentication and authorization, as well as logging and traffic optimization.
And it makes a lot of sense to not point the default gateway of a corporate network towards the internet, as this raises the bar for malicious software (and users) to bypass network security. But that means that software that needs to access the internet has to do so through the proxy server.
This example has been tried through a SQUID proxy with more or less default configuration.

Arduino http client

The Arduino ethernet library provides sufficient functionality to get basic tasks done. In this case get the weather feed from BBC for a specific location. But with a little twist: It only speaks to the proxy server.
Example Setup

Modified "Web Client" example

This is just a slightly modified version of the "Examples->Ethernet->WebClient" that comes with the Arduino IDE and an installed Ethernet library. Works as show with an Arduino MKR Zero, but should work with an UNO just as well.


 /*  
  Web proxy client  
  This sketch reads an RSS feed from a website  
  using an Arduino Wiznet Ethernet shield.  
  Circuit:  
  * Ethernet shield attached to pins 10, 11, 12, 13  
  created 18 Dec 2009  
  by David A. Mellis  
  modified 9 Apr 2012  
  by Tom Igoe, based on work by Adrian McEwen  
  modified 11 Jan 2022  
  by Andy Reischle for http proxy support  
  */  
 #include <SPI.h>  
 #include <Ethernet.h>  
 // Enter a MAC address for your controller below.  
 // Newer Ethernet shields have a MAC address printed on a sticker on the shield  
 // Don't forget to change that when you have more than one of these on one network segment  
 byte mac[] = { 0xDE, 0xAD, 0xBE, 0xEF, 0xFE, 0xED };  
 // if you don't want to use DNS (and reduce your sketch size)  
 // use the numeric IP instead of the name for the server:  
 //IPAddress server(74,125,232,128); // numeric IP for Google (no DNS)  
 char server[] = "weather-broker-cdn.api.bbci.co.uk";  // name address for target server (using DNS)  
 char proxy[] = "proxy.internal.mycorporation.something"; // name of the internal proxy server (using DNS)  
 // Set the static IP address to use if the DHCP fails to assign  
 // That will be rather pointless in a corporate environment  
 IPAddress ip(192, 168, 0, 177);  
 IPAddress myDns(192, 168, 0, 1);  
 // Initialize the Ethernet client library  
 // with the IP address and port of the server  
 // that you want to connect to (port 80 is default for HTTP):  
 EthernetClient client;  
 // Variables to measure the speed  
 unsigned long beginMicros, endMicros;  
 unsigned long byteCount = 0;  
 bool printWebData = true; // set to false for better speed measurement  
 void setup() {  
  // You can use Ethernet.init(pin) to configure the CS pin  
  //Ethernet.init(10); // Most Arduino shields  
  Ethernet.init(5);  // MKR ETH shield  
  //Ethernet.init(0);  // Teensy 2.0  
  //Ethernet.init(20); // Teensy++ 2.0  
  //Ethernet.init(15); // ESP8266 with Adafruit Featherwing Ethernet  
  //Ethernet.init(33); // ESP32 with Adafruit Featherwing Ethernet  
  // Open serial communications and wait for port to open:  
  Serial.begin(9600);  
  while (!Serial) {  
   ; // wait for serial port to connect. Needed for native USB port only  
  }  
  // start the Ethernet connection:  
  Serial.println("Initialize Ethernet with DHCP:");  
  if (Ethernet.begin(mac) == 0) {  
   Serial.println("Failed to configure Ethernet using DHCP");  
   // Check for Ethernet hardware present  
   if (Ethernet.hardwareStatus() == EthernetNoHardware) {  
    Serial.println("Ethernet shield was not found. Sorry, can't run without hardware. :(");  
    while (true) {  
     delay(1); // do nothing, no point running without Ethernet hardware  
    }  
   }  
   if (Ethernet.linkStatus() == LinkOFF) {  
    Serial.println("Ethernet cable is not connected.");  
   }  
   // try to configure using IP address instead of DHCP:  
   Ethernet.begin(mac, ip, myDns);  
  } else {  
   Serial.print(" DHCP assigned IP ");  
   Serial.println(Ethernet.localIP());  
  }  
  // give the Ethernet shield a second to initialize:  
  delay(1000);  
  Serial.print("connecting to ");  
  Serial.print(proxy);  
  Serial.println("...");  
  // if you get a connection, report back via serial:  
  // target the request to the proxy  
  if (client.connect(proxy, 8080)) {  
   Serial.print("connected to ");  
   Serial.println(client.remoteIP());  
   // Make a HTTP request to the proxy:  
   client.println("GET http://weather-broker-cdn.api.bbci.co.uk/en/observation/rss/2907669 HTTP/1.1");  
   client.println("Host: weather-broker-cdn.api.bbci.co.uk");  
   client.println("Connection: close");  
   client.println();  
  } else {  
   // if you didn't get a connection to the server:  
   Serial.println("connection failed");  
  }  
  beginMicros = micros();  
 }  
 void loop() {  
  // if there are incoming bytes available  
  // from the server, read them and print them:  
  int len = client.available();  
  if (len > 0) {  
   byte buffer[80];  
   if (len > 80) len = 80;  
   client.read(buffer, len);  
   if (printWebData) {  
    Serial.write(buffer, len); // show in the serial monitor (slows some boards)  
   }  
   byteCount = byteCount + len;  
  }  
  // if the server's disconnected, stop the client:  
  if (!client.connected()) {  
   endMicros = micros();  
   Serial.println();  
   Serial.println("disconnecting.");  
   client.stop();  
   Serial.print("Received ");  
   Serial.print(byteCount);  
   Serial.print(" bytes in ");  
   float seconds = (float)(endMicros - beginMicros) / 1000000.0;  
   Serial.print(seconds, 4);  
   float rate = (float)byteCount / seconds / 1000.0;  
   Serial.print(", rate = ");  
   Serial.print(rate);  
   Serial.print(" kbytes/second");  
   Serial.println();  
   // do nothing forevermore:  
   while (true) {  
    delay(1);  
   }  
  }  
 }  
If all goes well, this is the output:

Initialize Ethernet with DHCP:
  DHCP assigned IP 1.2.3.4
connecting to proxy.internal.mycorporation.something...
connected to 4.3.2.1
HTTP/1.0 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/rss+xml
ETag: "8b0e69972930952dc99c4d239b1c489402f5940392a0bbe16ed534b0b242787f"
expiry_extended_seconds: 0
Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips
Cache-Control: no-transform, max-age=60
Date: Tue, 11 Jan 2022 18:37:23 GMT
Content-Length: 1660
X-Cache: MISS from proxy.internal.mycorporation.something
X-Cache-Lookup: MISS from proxy.internal.mycorporation.something:8080
Via: 1.0 proxy.internal.mycorporation.something:8080 (squid/2.6.STABLE21)
Proxy-Connection: close

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" version="2.0">
  <channel>
    <title>BBC Weather - Observations for  Heilbronn, DE</title>
    <link>https://www.bbc.co.uk/weather/2907669</link>
    <description>Latest observations for Heilbronn from BBC Weather, including weather, temperature and wind information</description>
    <language>en</language>
    <copyright>Copyright: (C) British Broadcasting Corporation, see http://www.bbc.co.uk/terms/additional_rss.shtml for more details</copyright>
    <pubDate>Tue, 11 Jan 2022 17:00:00 GMT</pubDate>
    <dc:date>2022-01-11T18:00:00Z</dc:date>
    <dc:language>en</dc:language>
    <dc:rights>Copyright: (C) British Broadcasting Corporation, see http://www.bbc.co.uk/terms/additional_rss.shtml for more details</dc:rights>
    <atom:link href="https://weather-service-thunder-broker.api.bbci.co.uk/en/observation/rss/2907669" type="application/rss+xml" rel="self" />
    <item>
      <title>Tuesday - 18:00 CET: Not available, 4°C (40°F)</title>
      <link>https://www.bbc.co.uk/weather/2907669</link>
      <description>Temperature: 4°C (40°F), Wind Direction: Easterly, Wind Speed: 4mph, Humidity: 66%, Pressure: 1033mb, Rising, Visibility: --</description>
      <pubDate>Tue, 11 Jan 2022 18:00:00 GMT</pubDate>
      <guid isPermaLink="false">https://www.bbc.co.uk/weather/2907669-2022-01-11T18:00:00.000+01:00</guid>
      <dc:date>2022-01-11T17:00:00Z</dc:date>
      <georss:point>49.1399 9.2205</georss:point>
    </item>
  </channel>
</rss>

disconnecting.
Received 2181 bytes in 0.4765, rate = 4.58 kbytes/second

No big deal extracting values from that output. I've done something similar here:
Fritzmeter Project , but with TinyXML you should be able to do that with more style.

The proxy log (acces.log) entry shows:
1.2.3.4 TCP_MISS/200 2181 GET http://weather-broker-cdn.api.bbci.co.uk/en/observation/rss/2907669 - DIRECT/184.86.251.134 application/rss+xml


What about HTTPS?

The Arduino Uno is to anemic for HTTPS, but the experiments on this page have been done with a MKR ZERO and the matching MKR ETH Shield. The MKR ZERO can do HTTPS easily.
But:
HTTPS through a http-proxy works differently from HTTP: The client requests a tunneled connection from the Proxy, using the "CONNECT" method (rather than "GET" in the HTTP example).
The otherwise brilliant SSLClient library does not provide an obvious way to support that:
I have not yet found a way to establish that first part of the connection request without encryption and switch to SSL/TLS after the tunnel is initiated.