Ideas for firewalling

Jan Engelhardt

revision 2, December 2006

 

“Be conservative in what you accept and be liberal
in what you do.”
-RFC 793[1] turned around

1   Kernel policies

Other parts in the Linux kernel besides Netfilter may also have switches to control packet flow. The routing layer has some of these, which can be changed in /proc/sys/net/ipv4/conf/*/. The important ones are "accept_redirects", "accept_source_route", "rp_filter", and, secondarily, "send_redirects". The first two define whether the routing code should consider these kinds of ICMP messages (Redirect, Router Solicitation) for its execution flow. "rp_filter" checks if the packet can legitimately come from the interface it was received on (see [2]).

2   Basic filtering

At first, many common ICMP annoyances should be filtered out, such as ICMP Redirect, ICMP Router Advertisement, ICMP Router Solicitation. Especially Windows 98 likes to send them out. A few packets related to routing can also be filtered out at the routing level, which happens to come before Netfilter code[3]. Not all ICMP types are handled by te routing code, and a double check should hopefully not be too costly. ICMP packets are best dropped since replying to them may reveal that we are alive. This drop should be done for every interface (i.e. no -i or -o option), in the INPUT and FORWARD chains. The Linux kernel itself does not normally output these so it is should be safe to not have such rules in the OUTPUT chain. In fact, the OUTPUT chain should stay clear of this so that you can possibly do network hardening testing.

RFC 792[4] divides ICMP packet types hierarchially into so-called types and codes. iptables may either match all codes of a specific type or one code of one specific type, or any type.

-N icmp_drop;
-A icmp_drop -p icmp --icmp-type redirect -j DROP;
-A icmp_drop -p icmp --icmp-type router-advertisement -j DROP;
-A icmp_drop -p icmp --icmp-type router-solicitation -j DROP;
-A INPUT -j icmp_drop;
-A FORWARD -j icmp_drop;

The redirect type includes four codes, hence blocks both network-redirect and host-redirect, plus the two TOS variants thereof. router-advertisement and router-soliciation are types without any subcodes.

2.1   Traceroute filtering

A remote host can send an ICMP Echo packet with the "IP Traceroute"[5] flag. This cannot realiably be blocked, since the ipv4options module does not have code to look at it. However, we can block the return packets passing the machine using:

-A icmp_drop -p icmp --icmp-type 30 -j DROP;

Type 30 must be specified numerically (as shown above), since iptables does not know the names for such extensions that do not seem to be widely deployed. Linux does not seem to support IP Traceroute as of this writing, so it is not necessary to block it in the OUTPUT chain.

Filtering UDP Traceroute and TCP Traceroute is quite impossible without deranging other traffic, since a TTL of zero -- which is what commonly happens during traceroute -- is legitimate.

3   TCP Stealth Scan Detection

In the ideal case, a user should make a connection, do what needs to be done, and close it. Standardized TCP connections begin with a SYN packet[6]. Everything else can be considered anomalous. To match a scan, the inner workings of the scan program are needed. Sometimes, it also suffices to see what traffic is coming in, because that is what can be matched. As said, normal TCP connections always begin with a SYN, anything else must be forged, and therefore is easy to match. The following rule will match most anomalies, including TCP NULL, TCP FIN, TCP XMAS, and possibly other strange combinations:

-p tcp ! --syn -m conntrack --ctstate INVALID

It does not match TCP ACK scans, because a "spurious" ACK may very well be part of an already-existing connection where our machine just does not know about its state (e.g. after a reboot or conntrack flush).

An extra rule is required to be able to continue to receive "Connection refused" message ourselves -- e.g. if you run `telnet somehost someClosedPort` -- the returned RST and/or RST-ACK packets are not associated with any connection in Netfilter. Hence, we need an exclusion rule to the above:

-p tcp --tcp-flags SYN,FIN,RST RST

According to RFC 793 page 65[7], an RST will not be replied to, hence no information leak will occur as a result of accepting it. (RST-ACK is required as a reponse to SYN to a closed port, see RFC 793 page 37[8] paragraph 3.)

You can incorporate this into your ruleset as follows. A user-defined chain is handy, but you can have it any way:

-N tcp_inval;
-A tcp_inval -p tcp --tcp-flags SYN,FIN,RST,ACK RST,ACK -j RETURN;
-A tcp_inval -j LOG --log-prefix "[INVALID] ";
-A tcp_inval -j CHAOS;
-A INPUT -p tcp ! --syn -m conntrack --ctstate INVALID -j tcp_inval;

This allows the use of more targets, such as LOG (shown here), without repeatedly matching non-SYN, and is the preferred way. This ruleset can also be used in the FORWARD chain without fear to kill already-running connections. Active connections that have not yet been seen by Netfilter will become NEW and ESTABLISHED after the next two packets, respectively, without making the connection INVALID.

The CHAOS target is discussed in chapter 7.

4   TCP SYN scan detection

A SYN scan half-opens a TCP connection and terminates the handshake in the middle. In other words, if a SYN is received, we send the obligatory SYN-ACK and then the scanner immediately sends an RST. Using a state machine (automaton) inside iptables, it is easy to match the third packet:


[svg] [dot]

When connecting to localhost, special attention needs to be given since we receive our own packets. When the socket is open, the server side sends its SYN-ACK. Under normal circumstances, this packet is only seen in the OUTPUT chain and hence is not of relevance for the state graph, which is modeled upon incoming packets. However, when the loopback interface is involved, we will see our own SYN-ACK packet again in the INPUT chain, so it is to be ignored. The ruleset can modeled as follows with iptables:

SYN=$[0x401];
CLOSED=$[0x402];
SYNSCAN=$[0x403];
ESTAB=$[0x404];

-N mark_closed;
-A mark_closed -j CONNMARK --set-mark $CLOSED;

-N mark_estab;
-A mark_estab -j CONNMARK --set-mark $ESTAB;

-N tcp_new1;
-A tcp_new1 -i lo -p tcp --tcp-flags ALL SYN,ACK -j RETURN;
-A tcp_new1 -i lo -p tcp --tcp-flags ALL RST,ACK -g mark_closed;
-A tcp_new1 -p tcp --tcp-flags ALL ACK -g mark_estab;
-A tcp_new1 -j CONNMARK --set-mark $SYNSCAN;

-A INPUT -m connmark --mark $SYN -j tcp_new1;
-A INPUT -p tcp --syn -m conntrack --ctstate NEW -j CONNMARK --set-mark $SYN;

When a SYN packet in a new connection arrives, it does not have any mark set (assuming you did not set one), hence will only match the second rule in the INPUT chain (as shown here). The connection will then be marked with some integer that we define as the "SYN received" state. When the client then gives the third packet in the TCP handshake, the first rule in INPUT triggers and the second does not, because the connection is marked with $SYN and already has state ESTABLISHED. Note that the order of the rules is important, as marking it with $SYN first and then going to tcp_new1 will not work as intended.

If the third packet is an ACK, a goto (-g, note that this is different from -j) to the mark_estab chain is executed, the connection will be marked with $ESTAB and control is returned to the INPUT chain. Also note our special cases for SYN-ACK, which is ignored (by returning from the tcp_new1 chain and leaving the mark as-is, and RST, which will trigger the connection to be marked with $CLOSED so that it does not match any other rule.

Blocking SYN scans is impossible, because you cannot tell in advance whether a SYN sent by the remote side is intended to be a real connection or a scan attempt. However, you could, for example, block all requests for a while from the host who did a SYN scan using the recent module. Assuming handle_evil is a user-defined chain doing that, you have two ways for implementation, varying in the position of the call to handle_evil:

-A tcp_new1 -j handle_evil; (or)
-A INPUT -m connmark --mark $SYNSCAN -j handle_evil;

5   TCP connect scan detection

Extending the state graph to catch connect scans yields:


[svg] [dot]

CNSCAN=$[0x406];
VALID=$[0x408];

-N mark_cnscan;
-A mark_cnscan -j CONNMARK --set-mark $CNSCAN;

-N tcp_new3;
-A tcp_new3 -p tcp --tcp-flags SYN,FIN,RST RST -g mark_cnscan;
-A tcp_new3 -p tcp --tcp-flags SYN,FIN,RST FIN -g mark_cnscan;
-A tcp_new3 -j CONNMARK --set-mark $VALID;

-A INPUT -m connmark --mark $ESTAB -j tcp_new3;

Note that last rule in this code snippet must come before the rule to jump to tcp_new1, i.e.:

-A INPUT -m connmark --mark $ESTAB -j tcp_new3;
-A INPUT -m connmark --mark $SYN -j tcp_new1;

6   TCP Grab Scan Detection

There is yet another type of scan, the banner grab scan, where a client connects to solely read bytes and then disconnect. Note that such an action may very well be part of a non-malicious action (FTP DATA connections, for example), so connections should be handled with care. Some services, such as SSH, always are bidirectional from a Layer3 point of view, so it is safe to apply Grab Scan Detection on it. Speaking of SSH, it is a promiment service that presents its data before the client takes any action, e.g. it shows the banner "SSH-2.0-OpenSSH_4.4" voluntarily. I would like to show a way how to match such "grab scans". As already mentioned, a grab scan is where the client sends no data itself, hence its TCP packets have no payload.


[svg] [dot]

TCP packets without any payload usually are 52 octets in size. However, this does not always need to be the case, since the IPv4 header itself may be up to 60 octets long (RFC 791). This problem cannot be solved with the regular iptables magic as of this writing, but the portscan kernel module as discussed in chapter 7 can do it. The following rules are an enhancement to chapter 6's block:

ESTAB2=$[0x405];
CNSCAN=$[0x406];
GRSCAN=$[0x407];
VALID=$[0x408];

-N mark_grscan;
-A mark_grscan -j CONNMARK --set-mark $GRSCAN

-N tcp_new3;
-A tcp_new3 -p tcp --tcp-flags SYN,FIN,RST RST -g mark_cnscan;
-A tcp_new3 -p tcp --tcp-flags SYN,FIN,RST FIN -g mark_cnscan;
-A tcp_new3 ! -i lo -p tcp --tcp-flags SYN,FIN,RST,ACK ACK -m length --length 52 -g mark_estab2;
-A tcp_new3 -j CONNMARK --set-mark $VALID;

-N tcp_new4;
-A tcp_new4 -p tcp --tcp-flags SYN,FIN,RST,ACK ACK -m length --length 52 -j RETURN;
-A tcp_new4 -p tcp --tcp-flags SYN,FIN,RST RST -g mark_grscan;
-A tcp_new4 -p tcp --tcp-flags SYN,FIN,RST FIN -g mark_grscan;
-A tcp_new4 -j CONNMARK --set-mark $VALID;

-A INPUT -m connmark --mark $ESTAB2 -j tcp_new4;
-A INPUT -m connmark --mark $ESTAB -j tcp_new3;

Note that the loopback interface is excluded again, because seeing our own packets makes it trigger early. There are possibly ways around this, but that is beyond the scope of this document. A full example iptables ruleset for use with iptables-restore can be found in the source distribution. If you load it, running, for example, the Grab Scan will look like this (kernel messages on master in bold):

master# iptables-restore scan_detect.ipt
master# ssh vm6402
vm6402# telnet master 22
[ESTAB1] IN=vmnet2 OUT= MAC= SRC=192.168.64.2 DST=192.168.64.1 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=31040 DF PROTO=TCP SPT=3180 DPT=22 WINDOW=1460 RES=0x00 ACK URGP=0
Trying 192.168.64.1...
Connected to 192.168.64.1.
Escape character is '^]'.
[ESTAB2] IN=vmnet2 OUT= MAC= SRC=192.168.64.2 DST=192.168.64.1 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=31041 DF PROTO=TCP SPT=3180 DPT=22 WINDOW=1460 RES=0x00 ACK URGP=0
SSH-2.0-OpenSSH_4.4
^]
telnet> exit
[GRSCAN] IN=vmnet2 OUT= MAC= SRC=192.168.64.2 DST=192.168.64.1 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=31042 DF PROTO=TCP SPT=3180 DPT=22 WINDOW=1460 RES=0x00 ACK FIN URGP=0
Connection closed.

7   portscan match

As the number of rules for all this portscan logic grows, it becomes a little hard to keep track of it. By putting it all into one kernel module, it can be nicely wrapped up into a single match that is listed in your iptables chains. Processing speed will also improve since the netfilter code is only run once, and repeated matches also only need to be done once in the new kernel module.

8   CHAOS target

Network scanners such as nmap have extra measures for operating systems that ratelimit the number of return ICMP and/or RST packets, in essence packets for "port closed". When this happens, nmap throttles its scan timing to accomodate for this. showing unusual behavior. Unusual in the respect that you would not normally expect it or would want it to happen.

Finish/clean this paragraph

As I have found a way to confuse and slow down network scans, as paraphrased in the subtitle of this document, I effectively use this target1 for almost anything classified bad.

^1 It actually is a user-defined chain consisting of three rules.

Certain operating systems have their peculiarities with sending ICMP messages. I am referring to Solaris, which is said to send out at most only 2 ICMP messages per second (see nmap manual). There are two cases for different OS behavior: either because the RFC allows them to, or because it indeed breaks the RFC. Network scanners adjust their working to different behavior, of course, to get the "best" results. And actually, we can just change our behavior too. ("Be liberal in what you do.") By simulating certain behavior, the network scanner must adjust, and so, we can control it to a certain degree. This is what makes up the crème of jen_ipfw, as it is the first firewall to have it. It exploits this ICMP rate limit adjust feature in nmap and acts accordingly.

For TCP, jen_ipfw randomly alternates between sending the origin a host-unreachable packet back and putting the connection into a TARPIT. These together both rate limit the scanner and also make it see "open/filtered" for some ports, rather than just "filtered". For UDP, we randomly spew host-unreach or silently DROP it. DROP on UDP generally show "open/filtered" because of the way UDP is designed. In fact, as for the ratelimit, using -m hashlimit instead of -m random has about the same effect, but hashlimit's timing is determinable.

This is ludicrously cool :) To quote a netfilter mailing list subscriber:

"Interesting this use of random. I'll have to play with it when I get that rare bit of spare time for testing and fooling about with things not in prod or requirening immediate attention to fix! Which tend to be even more rare these days in our understaffed env. But, your reports of this random further confusing the scanner and slowing it down are extremely interesting..."[ref]

Even in its "Insane timing" (as fast as possible) mode, nmap reduces itself down to at most 2 TCP ports per second if it recognizes an ICMP rate limit, and even less ports per second on UDP. For the record, the Insane Timing mode is also a knock-often, i.e. nmap sends multiple packets per port. Anyway, "random" matches every once in a while, as does "hashlimit".

For exprimenting and analysis of the workings of confuse, see chapter 6.7. later on.

6.1   Tuning CONFUSE

I changed the percentage at which random ICMP packets, as part of the CONFUSE target are sent, during various versions of jen_ipfw, hoping to get better results. Instead, I got worse results on one try, and so was forced, but actually also encouraged, to do timing tests. Each value in the -sF/-sN/-sX field was obtained by running only one nmap, because it would have taken a lot longer if I took the average of three runs. But since CONFUSE works randomly anyway and does the same to all three scan types, the average of s[FNX] should be representative very well.

100% means: only -j REJECT
x% means: -m random --average x -j REJECT; -j TARPIT;
0% means: only -j TARPIT;

The nmap command used is: nmap <scantype> 127.0.0.1 -r -T Insane -p 1-1024 -P0; You can run the `bmtcp_confuse` program to run the benchmark on your machine.

Scanning 1024 random ports with FIN, NULL or XMAS scan type (well -p 1-1024 is not random at all, but the behavior is the same for any port number).

in seconds
R% -sF -sN -sX Average
100%0.1390.1240.1230.129
90%4.3805.3044.4134.699
80%9.05910.57310.0839.905
70%19.08829.56429.75426.135
60%42.48941.44736.97040.302
50%66.45757.96762.54962.324
40%91.84291.81091.29391.648
30%143.038137.313142.253140.868
20%220.970230.123226.213225.769
10%403.269374.713402.954393.645
5%557.139544.653555.182552.325
4%581.988580.353566.433576.258
3%621.329634.744626.752627.608
2%649.849646.244648.853648.315
1%701.677749.574710.774720.675
0%54.63354.62554.64254.633
 
Total gain over no firewall +558562%
Total gain over minimal/regular firewall +53243%

Complete table

Note that I have reversed this table to show the increasing time. Values were recorded by nmapping 127.0.0.1. Using nmap over real (Ethernet and alike) and/or slower networks might yield even larger values than shown in this table.

I also took the time to do an UDP scan. The values were much wider spread so I redone the timing a lot of times, actually more than you can see here. confuse.png confuse.xls

in seconds
R% -sU -sU -sU -sU -sU Average
100%0.1310.1290.1330.1310.1300.131
90%5.3405.2295.5065.2945.3155.337
80%9.2539.6439.6569.3429.0879.396
70%17.13017.45517.22315.49518.05017.071
60%900.653900.375900.704900.380900.763900.575
50%900.370only run once from here900.370
40%900.366900.366
30%900.473 900.473
20%900.382 900.382
10%900.824 900.824
5%901.026 901.026
1%900.463 900.463
0%54.641 54.641
 
Total gain over no firewall +687276%
Total gain over minimal/regular firewall +65056%

Complete table UDP

References

[1] RFC 793: TCP
[2] linux/net/ipv4/fib_frontend.c from kernel 2.6.18
[3] Packet flow in the Linux kernel, by josh[at]imagestream.com
[4] RFC 792: ICMP
[5] RFC 1393: IP/ICMP Traceroute
[6] RFC 793: TCP, page 23: state diagram
[7] RFC 793: TCP, page 65: RST handling
[8] RFC 793: TCP, page 37: RST-ACK in response to SYN