P48-13


                            ==Phrack Magazine==

             Volume Seven, Issue Forty-Eight, File 13 of 18


                             [ Project Neptune ]

                        by daemon9 / route / infinity
                             for Phrack Magazine
                      July 1996 Guild Productions, kid

                       comments to route@infonexus.com


        This project is a comprehensive analysis of TCP SYN flooding.  You 
may be wondering, why such a copious treatment of TCP SYN flooding?  
Apparently, someone had to do it.  That someone turned out to be me (I need
a real hobby).  The SYNflood Project consists of this whitepaper, including 
anotated network monitor dumps and fully functional robust Linux sourcecode.


                --[ Introduction ]--


        TCP SYN flooding is a denial of service (DOS) attack.  Like most DOS
attacks, it does not exploit a software bug, but rather a shortcoming in the
implemenation of a particular protocol.  For example, mail bombing DOS attacks
work because most SMTP agents are dumb and will accept whatever is sent their 
way.  ICMP_ECHO floods exploit the fact that most kernels will simply reply to
ICMP_ECHO request packets one after another, ad inifintum.  We will see that
TCP SYN flood DOS attacks work because of the current implementation of TCP's
connection establishment protocol.


                --[ Overview  ]--


        This whitepaper is intended as a complete introduction to TCP SYN 
flooding (refered to hereafter as SYN flooding).  It will cover the attack
in detail, including all relevant necessary background information.  It is 
organized into sections:

        Section I.      TCP Background Information
        Section II.     TCP Memory Structures and the Backlog
        Section III.    TCP Input Processing
        Section IV.     The Attack
        Section V.      Network Trace
        Section VI.     Neptune.c
        Section VII.    Discussion and Prevention
        Section VIII.   References

(Note that readers unfamiliar with the TCP/IP protocol suite may wish to first
read ftp://ftp.infonexus.com/pub/Philes/NetTech/TCP-IP/tcipIp.intro.txt.gz)


                --[ The Players ]--


                A:      Target host
                X:      Unreachable host 
                Z:      Attacking host
             Z(x):      Attacker masquerading as the unreachable


                --[ The Figures ]--
                

                There are a few network transaction  figures in the paper and
they are to be interpreted as per the following example:

       tick   host a      control     host b

tick:   
        A unit of time.  There is no distinction made as to *how* much time 
passes between ticks, just that time passes.  It's generally not going to be
a great deal. 
host a: 
        A machine particpating in a TCP-based conversation.
control: 
        This field shows any relevant control bits set in the TCP header and 
the direction the data is flowing
host b: 
        A machine particpating in a TCP-based conversation.

For example:

        1       A       ---SYN--->      B       

        In this case, at the first refrenced point in time, host a is sending
a TCP segment to host b with the SYN bit on.  Unless stated, we are generally 
not concerned with the data portion of the TCP segment.



                Section I.      TCP Background Information



        TCP is a connection-oriented, reliable transport protocol.  TCP is
responsible for hiding network intricacies from the upper layers.  A 
connection-oriented protcol implies that the two hosts participating in a 
discussion must first establish a connection before data may be exchanged.  In
TCP's case, this is done with the three-way handshake.  Reliability can be 
provided in a number of ways, but the only two we are concerned with are data 
sequencing and acknowledgement.  TCP assigns sequence numbers to every byte in
every segment and acknowledges all data bytes recieved from the other end.  
(ACK's consume a sequence number, but are not themselves ACK'd.  That would be
ludicris.)  


                --[ TCP Connection Establishment ]--


        In order to exchange data using TCP, hosts must establish a connection.
TCP establishes a connection in a 3 step process called the 3-way handshake.
If machine A is running a client program and wishes to conect to a server
program on machine B, the process is as follows:

                        fig(1)
       
        1       A       ---SYN--->      B       

        2       A    <---SYN/ACK---     B

        3       A       ---ACK--->      B

                                
        At (1) the client is telling the server that it wants a connection.
This is the SYN flag's only purpose.  The client is telling the server that 
the sequence number field is valid, and should be checked.  The client will 
set the sequence number field in the TCP header to it's ISN (initial sequence
number).  The server, upon receiving this segment (2) will respond with it's 
own ISN (therefore the SYN flag is on) and an ACKnowledgement of the clients 
first segment (which is the client's ISN+1).  The client then ACK's the 
server's ISN (3).  Now data transfer may take place.

 
              --[ TCP Control Flags  ]--


        There are six TCP control flags.  We are only concerned with 3, but 
the others are included for posterity:

*SYN:   Synchronize Sequence Numbers
        The synchronize sequence numbers field is valid.  This flag is only 
valid during the 3-way handshake.  It tells the receiving TCP to check the 
sequence number field, and note it's value as the connection-initiator's 
(usually the client) initial sequence number.  TCP sequence numbers can 
simply be thought of as 32-bit counters.  They range from 0 to 4,294,967,295.
Every byte of data exchanged across a TCP connection (along with certain 
flags) is sequenced.  The sequence number field in the TCP header will contain
the sequence number of the *first* byte of data in the TCP segment.  

*ACK:   Acknowledgement
        The acknowledgement number field is valid.  This flag is almost always
set.   The acknowledgement number field in the TCP header holds the value of 
the next *expected* sequence number (from the other side), and also 
acknowledges *all* data (from the other side) up through this ACK number minus
one.

*RST:   Reset
        Destroy the referenced connection.  All memory structures are torn 
down.

URG:    Urgent 
        The urgent pointer is valid.  This is TCP's way of implementing out
of band (OOB) data.  For instance, in a telnet connection a `ctrl-c` on the 
client side is considered urgent and will cause this flag to be set. 

PSH:    Push
        The receiving TCP should not queue this data, but rather pass it to 
the application as soon as possible.  This flag should always be set in 
interactive connections, such as telnet and rlogin.

FIN:    Finish 
        The sending TCP is finished transmitting data, but is still open to 
accepting data.


                --[ Ports ]--
                
        
        To grant simultaneous access to the TCP module, TCP provides a user 
interface called a port.  Ports are used by the kernel to identify network 
processes.  They are strictly transport layer entities.  Together with an 
IP address, a TCP port provides provides an endpoint for network 
communications.  In fact, at any given moment *all* Internet connections can 
be described by 4 numbers: the source IP address and source port and the 
destination IP address and destination port.  Servers are bound to 
'well-known' ports so that they may be located on a standard port on 
different systems.  For example, the telnet daemon sits on TCP port 23.
        


                Section II.     TCP Memory Structures and the Backlog

                

        For a copius treatment of the topic of SYN flooding, it is necessary
to look at the memory structures that TCP creates when a client SYN arrives
and the connection is pending (that is, a connection that is somewhere in 
the process of the three-way handshake and TCP is in the SYN_SENT or 
SYN_RVCD state).


                --[ BSD ]--             


        Under BSD style network code, for any given pending TCP connection 
there are three memory structures that are allocated (we do not discuss the 
process (proc) structure and file structure, but the reader should be aware 
that they exist as well.):

Socket Structure (socket{}):    
        Holds the information related to the local end of the communications 
link: protocol used, state information, addressing information, connection 
queues, buffers, and flags.

Internet Protocol Control Block Structure (inpcb{}):
        PCB's are used at the transport layer by TCP (and UDP) to hold various
pieces of information needed by TCP.  They hold: TCP state information, IP 
address information, port numbers, IP header prototype and options and a 
pointer to the routing table entry for the destination address.  PCB's are 
created for a given TCP based server when the server calls listen(),

TCP Control Block Structure (tcpcb{}):
        The TCP control block contains TCP specific information such as timer
information, sequence number information, flow control status, and OOB data.


                --[ Linux ]--


        Linux uses a different scheme of memory allocation to hold network
information.  The socket structure is still used, but instead of the pcb{} 
and tcpcb{}, we have:

Sock Structure (sock{}):
        Protocol specific information, most of the data structures are TCP
related.  This is a huge structure.

SK Structure (sk_buff{}):
        Holds more protocol specific information including packet header 
information, also contains a sock{}.

According to Alan Cox:
        The inode is the inode holding the socket (this may be a dummy inode 
for non file system sockets like IP), the socket holds generic high level
methods and the struct sock is the protocol specific object, although all but 
a few experimental high performance items use the same generic struct sock and
support code. That holds chains of linear buffers (struct sk_buff's).

[ struct inode -> struct socket -> struct sock -> chains of sk_buff's ]


                --[ The Backlog Queue]--

        
        These are large memory structures.  Every time a client SYN arrives
on a valid port (a port where a TCP server is listen()ing), they must be 
allocated.  If there were no limit, a busy host could easily exhuast all of
it's memory just trying to process TCP connections.  (This would be an even
simpler DOS attack.)  However, there is an upper limit to amount of 
concurrent connection requests a given TCP can have outstanding for a 
given socket.  This limit is the backlog and it is the length of the queue
where incoming (as yet incomplete) connections are kept.  This queue limit 
applies to both the number of imcomplete connections (the 3-way handshake has
not been completed) and the number of completed connections that have not 
been pulled from the queue by the application by way of the accept() call.
If this backlog limit is reached, we will see that TCP will silently 
discard all incoming connection requests until the pending connections can 
be dealt with.  
        The backlog is not a large value.  It does not have to be.  Normally
TCP is quite expedient in connection establishment processing.  Even if a
connection arrived while the queue was full, in all likelyhood, when the
client retransmits it's connection request segment, the receiving TCP will
have room again in it's queue.  Different TCP implementations have different
backlog sizes.   Under BSD style networking code, there is also 'grace' margin 
of 3/2.  That is, TCP will allow up to backlog*3/2+1 connections.  This will
allow a socket one connection even if it calls listen with a backlog of 0.  
Some common backlog values:
                        fig(2)

   OS           Backlog   BL+Grace  Notes       
---------------------------------------------------------------------------
SunOS 4.x.x:     5           8 
IRIX 5.2:        5           8
Solaris
Linux 1.2.x:    10          10   Linux does not have this grace margin.
FreeBSD 2.1.0:              32
FreeBSD 2.1.5:             128
Win NTs 3.5.1:   6           6   NT does not appear to have this margin.
Win NTw 4.0:     6           6   NT has a pathetic backlog.



                Section III.    TCP Input Processing


        
        To see exactly where the attack works it is necessary to watch as 
the receiving TCP processes an incoming segment.  The following is true for
BSD style networking, and only processes relevant to this paper are 
discussed.

A packet arrives and is demultiplexed up the protocol stack to TCP.  The TCP
state is LISTEN:

Get header information:
        TCP retrieves the TCP and IP headers and stores the information in
memory.
Verify the TCP checksum:
        The standard Internet checksum is applied to the segment.  If it 
fails, no ACK is sent, and the segment is dropped, assuming the client will
retranmit it.
Locate the PCB{}:
        TCP locates the pcb{} associated with the connection.  If it is not
found, TCP drops the segment and sends a RST.  (Aside: This is how TCP
handles connections that arrive on ports with no server listen()ing.)  If
the PCB{} exists, but the state is CLOSED, the server has not called 
connect() or listen().  The segment is dropped, but no RST is sent.  The
client is expected to retransmit it's connection request.  We will see this
occurence when we discuss the 'Linux Anomaly'.
Create new socket:
        When a segment arrives for a listen()ing socket, a slave socket is
created.  This is where a socket{}, tcpcb{}, and another pcb{} are created.
TCP is not committed to the connection at this point, so a flag is set to
cause TCP to drop the socket (and destroy the memory structures) if an
error is encountered.  If the backlog limit is reached, TCP considers this 
an error, and the connection is refused.  We will see that this is exactly 
why the attack works.   Otherwise, the new socket's TCP state is LISTEN, and
the completion of the passive open is attempted.
Drop if RST, ACK, or no SYN:
        If the segment contains a RST, it is dropped.  If it contains an
ACK, it is dropped, a RST is sent and the memory structures torn down (the
ACK makes no sense for the connection at this point, and is considered an
error).  If the segment does not have the SYN bit on, it is dropped.  If
the segment contains a SYN, processing continues.
Address processing, etc:
        TCP then gets the clients address information into a buffer and 
connects it's pcb{} to the client, processes any TCP options, and 
initializes it's initial send sequence (ISS) number.
ACK the SYN:
        TCP sends a SYN, ISS and an ACK to the client.  The connection
establishment timer is set for 75 seconds at this point. The state changes 
to SYN_RCVD.  Now. TCP is commited to the socket.  We will see that this 
is state the target TCP will be in when in the throes of the attack because
the expected client response is never received.  The state remains SYN_RCVD
until the connection establishment timer expires, in which case the all the 
memory structures associated with the connection are destroyed, and the
socket returns to the LISTEN state.

                

                Section IV.     The Attack


        
        A TCP connection is initiated with a client issuing a request to a 
server with the SYN flag on in the TCP header.  Normally the server will 
issue a SYN/ACK back to the client identified by the 32-bit source address in 
the IP header.  The client will then send an ACK to the server (as we 
saw in figure 1 above) and data transfer can commence.  When the client IP 
address is spoofed to be that of an unreachable, host, however, the targetted
TCP cannot complete the 3-way handshake and will keep trying until it times 
out.  That is the basis for the attack.
        The attacking host sends a few (we saw that as little as 6 is 
enough) SYN requests to the target TCP port (for example, the telnet daemon).
The attacking host also must make sure that the source IP-address is spoofed 
to be that of another, currently unreachable host (the target TCP will be 
sending it's response to this address).  IP (by way of ICMP) will inform TCP 
that the host is unreachable, but TCP considers these errors to be transient 
and leaves the resolution of them up to IP (reroute the packets, etc) 
effectively ignoring them.  The IP-address must be unreachable because the 
attacker does not want *any* host to recieve the SYN/ACKs that will be coming 
from the target TCP, which  would elicit a RST from that host (as we saw in
TCP input above).  This would foil the attack.  The process is as follows:

                        fig(3)

        1       Z(x)    ---SYN--->      A

                Z(x)    ---SYN--->      A

                Z(x)    ---SYN--->      A

                Z(x)    ---SYN--->      A

                Z(x)    ---SYN--->      A

                Z(x)    ---SYN--->      A


        2       X    <---SYN/ACK---     A

                X    <---SYN/ACK---     A

                        ...

        3       X      <---RST---       A


At (1) the attacking host sends a multitude of SYN requests to the target 
to fill it's backlog queue with pending connections.  (2) The target responds
with SYN/ACKs to what it believes is the source of the incoming SYNs.  During
this time all further requests to this TCP port will be ignored.  The target
port is flooded.


                --[ Linux Anomaly ]--


        In doing my research for this project, I noticed a very strange
implementation error in the TCP module of Linux.   When a particular TCP 
server is flooded on a Linux host, strange things are afoot...  First, it 
appears that the connection-establishment timer is broken.  The 10 spoofed 
connection-requests keep the sockets in the SYN_RCVD state for just 
over 20 minutes (23 minutesto be exact.  Wonder what the signifigance of 
this is... Hmmm...).  Much longer than the 75-seconds it *should* be.  The 
next oddity is even more odd... After that seemingly arbitrary time period
(I have to determine what the hell is going on there), TCP moves the flooded
sockets into the CLOSE state, where they *stay* until a connection-request
arrives on a *different* port.  If a connection-request arrives on the