==Phrack Magazine== Volume Seven, Issue Forty-Eight, File 13 of 18 [ Project Neptune ] by daemon9 / route / infinity for Phrack Magazine July 1996 Guild Productions, kid comments to route@infonexus.com This project is a comprehensive analysis of TCP SYN flooding. You may be wondering, why such a copious treatment of TCP SYN flooding? Apparently, someone had to do it. That someone turned out to be me (I need a real hobby). The SYNflood Project consists of this whitepaper, including anotated network monitor dumps and fully functional robust Linux sourcecode. --[ Introduction ]-- TCP SYN flooding is a denial of service (DOS) attack. Like most DOS attacks, it does not exploit a software bug, but rather a shortcoming in the implemenation of a particular protocol. For example, mail bombing DOS attacks work because most SMTP agents are dumb and will accept whatever is sent their way. ICMP_ECHO floods exploit the fact that most kernels will simply reply to ICMP_ECHO request packets one after another, ad inifintum. We will see that TCP SYN flood DOS attacks work because of the current implementation of TCP's connection establishment protocol. --[ Overview ]-- This whitepaper is intended as a complete introduction to TCP SYN flooding (refered to hereafter as SYN flooding). It will cover the attack in detail, including all relevant necessary background information. It is organized into sections: Section I. TCP Background Information Section II. TCP Memory Structures and the Backlog Section III. TCP Input Processing Section IV. The Attack Section V. Network Trace Section VI. Neptune.c Section VII. Discussion and Prevention Section VIII. References (Note that readers unfamiliar with the TCP/IP protocol suite may wish to first read ftp://ftp.infonexus.com/pub/Philes/NetTech/TCP-IP/tcipIp.intro.txt.gz) --[ The Players ]-- A: Target host X: Unreachable host Z: Attacking host Z(x): Attacker masquerading as the unreachable --[ The Figures ]-- There are a few network transaction figures in the paper and they are to be interpreted as per the following example: tick host a control host b tick: A unit of time. There is no distinction made as to *how* much time passes between ticks, just that time passes. It's generally not going to be a great deal. host a: A machine particpating in a TCP-based conversation. control: This field shows any relevant control bits set in the TCP header and the direction the data is flowing host b: A machine particpating in a TCP-based conversation. For example: 1 A ---SYN---> B In this case, at the first refrenced point in time, host a is sending a TCP segment to host b with the SYN bit on. Unless stated, we are generally not concerned with the data portion of the TCP segment. Section I. TCP Background Information TCP is a connection-oriented, reliable transport protocol. TCP is responsible for hiding network intricacies from the upper layers. A connection-oriented protcol implies that the two hosts participating in a discussion must first establish a connection before data may be exchanged. In TCP's case, this is done with the three-way handshake. Reliability can be provided in a number of ways, but the only two we are concerned with are data sequencing and acknowledgement. TCP assigns sequence numbers to every byte in every segment and acknowledges all data bytes recieved from the other end. (ACK's consume a sequence number, but are not themselves ACK'd. That would be ludicris.) --[ TCP Connection Establishment ]-- In order to exchange data using TCP, hosts must establish a connection. TCP establishes a connection in a 3 step process called the 3-way handshake. If machine A is running a client program and wishes to conect to a server program on machine B, the process is as follows: fig(1) 1 A ---SYN---> B 2 A <---SYN/ACK--- B 3 A ---ACK---> B At (1) the client is telling the server that it wants a connection. This is the SYN flag's only purpose. The client is telling the server that the sequence number field is valid, and should be checked. The client will set the sequence number field in the TCP header to it's ISN (initial sequence number). The server, upon receiving this segment (2) will respond with it's own ISN (therefore the SYN flag is on) and an ACKnowledgement of the clients first segment (which is the client's ISN+1). The client then ACK's the server's ISN (3). Now data transfer may take place. --[ TCP Control Flags ]-- There are six TCP control flags. We are only concerned with 3, but the others are included for posterity: *SYN: Synchronize Sequence Numbers The synchronize sequence numbers field is valid. This flag is only valid during the 3-way handshake. It tells the receiving TCP to check the sequence number field, and note it's value as the connection-initiator's (usually the client) initial sequence number. TCP sequence numbers can simply be thought of as 32-bit counters. They range from 0 to 4,294,967,295. Every byte of data exchanged across a TCP connection (along with certain flags) is sequenced. The sequence number field in the TCP header will contain the sequence number of the *first* byte of data in the TCP segment. *ACK: Acknowledgement The acknowledgement number field is valid. This flag is almost always set. The acknowledgement number field in the TCP header holds the value of the next *expected* sequence number (from the other side), and also acknowledges *all* data (from the other side) up through this ACK number minus one. *RST: Reset Destroy the referenced connection. All memory structures are torn down. URG: Urgent The urgent pointer is valid. This is TCP's way of implementing out of band (OOB) data. For instance, in a telnet connection a `ctrl-c` on the client side is considered urgent and will cause this flag to be set. PSH: Push The receiving TCP should not queue this data, but rather pass it to the application as soon as possible. This flag should always be set in interactive connections, such as telnet and rlogin. FIN: Finish The sending TCP is finished transmitting data, but is still open to accepting data. --[ Ports ]-- To grant simultaneous access to the TCP module, TCP provides a user interface called a port. Ports are used by the kernel to identify network processes. They are strictly transport layer entities. Together with an IP address, a TCP port provides provides an endpoint for network communications. In fact, at any given moment *all* Internet connections can be described by 4 numbers: the source IP address and source port and the destination IP address and destination port. Servers are bound to 'well-known' ports so that they may be located on a standard port on different systems. For example, the telnet daemon sits on TCP port 23. Section II. TCP Memory Structures and the Backlog For a copius treatment of the topic of SYN flooding, it is necessary to look at the memory structures that TCP creates when a client SYN arrives and the connection is pending (that is, a connection that is somewhere in the process of the three-way handshake and TCP is in the SYN_SENT or SYN_RVCD state). --[ BSD ]-- Under BSD style network code, for any given pending TCP connection there are three memory structures that are allocated (we do not discuss the process (proc) structure and file structure, but the reader should be aware that they exist as well.): Socket Structure (socket{}): Holds the information related to the local end of the communications link: protocol used, state information, addressing information, connection queues, buffers, and flags. Internet Protocol Control Block Structure (inpcb{}): PCB's are used at the transport layer by TCP (and UDP) to hold various pieces of information needed by TCP. They hold: TCP state information, IP address information, port numbers, IP header prototype and options and a pointer to the routing table entry for the destination address. PCB's are created for a given TCP based server when the server calls listen(), TCP Control Block Structure (tcpcb{}): The TCP control block contains TCP specific information such as timer information, sequence number information, flow control status, and OOB data. --[ Linux ]-- Linux uses a different scheme of memory allocation to hold network information. The socket structure is still used, but instead of the pcb{} and tcpcb{}, we have: Sock Structure (sock{}): Protocol specific information, most of the data structures are TCP related. This is a huge structure. SK Structure (sk_buff{}): Holds more protocol specific information including packet header information, also contains a sock{}. According to Alan Cox: The inode is the inode holding the socket (this may be a dummy inode for non file system sockets like IP), the socket holds generic high level methods and the struct sock is the protocol specific object, although all but a few experimental high performance items use the same generic struct sock and support code. That holds chains of linear buffers (struct sk_buff's). [ struct inode -> struct socket -> struct sock -> chains of sk_buff's ] --[ The Backlog Queue]-- These are large memory structures. Every time a client SYN arrives on a valid port (a port where a TCP server is listen()ing), they must be allocated. If there were no limit, a busy host could easily exhuast all of it's memory just trying to process TCP connections. (This would be an even simpler DOS attack.) However, there is an upper limit to amount of concurrent connection requests a given TCP can have outstanding for a given socket. This limit is the backlog and it is the length of the queue where incoming (as yet incomplete) connections are kept. This queue limit applies to both the number of imcomplete connections (the 3-way handshake has not been completed) and the number of completed connections that have not been pulled from the queue by the application by way of the accept() call. If this backlog limit is reached, we will see that TCP will silently discard all incoming connection requests until the pending connections can be dealt with. The backlog is not a large value. It does not have to be. Normally TCP is quite expedient in connection establishment processing. Even if a connection arrived while the queue was full, in all likelyhood, when the client retransmits it's connection request segment, the receiving TCP will have room again in it's queue. Different TCP implementations have different backlog sizes. Under BSD style networking code, there is also 'grace' margin of 3/2. That is, TCP will allow up to backlog*3/2+1 connections. This will allow a socket one connection even if it calls listen with a backlog of 0. Some common backlog values: fig(2) OS Backlog BL+Grace Notes --------------------------------------------------------------------------- SunOS 4.x.x: 5 8 IRIX 5.2: 5 8 Solaris Linux 1.2.x: 10 10 Linux does not have this grace margin. FreeBSD 2.1.0: 32 FreeBSD 2.1.5: 128 Win NTs 3.5.1: 6 6 NT does not appear to have this margin. Win NTw 4.0: 6 6 NT has a pathetic backlog. Section III. TCP Input Processing To see exactly where the attack works it is necessary to watch as the receiving TCP processes an incoming segment. The following is true for BSD style networking, and only processes relevant to this paper are discussed. A packet arrives and is demultiplexed up the protocol stack to TCP. The TCP state is LISTEN: Get header information: TCP retrieves the TCP and IP headers and stores the information in memory. Verify the TCP checksum: The standard Internet checksum is applied to the segment. If it fails, no ACK is sent, and the segment is dropped, assuming the client will retranmit it. Locate the PCB{}: TCP locates the pcb{} associated with the connection. If it is not found, TCP drops the segment and sends a RST. (Aside: This is how TCP handles connections that arrive on ports with no server listen()ing.) If the PCB{} exists, but the state is CLOSED, the server has not called connect() or listen(). The segment is dropped, but no RST is sent. The client is expected to retransmit it's connection request. We will see this occurence when we discuss the 'Linux Anomaly'. Create new socket: When a segment arrives for a listen()ing socket, a slave socket is created. This is where a socket{}, tcpcb{}, and another pcb{} are created. TCP is not committed to the connection at this point, so a flag is set to cause TCP to drop the socket (and destroy the memory structures) if an error is encountered. If the backlog limit is reached, TCP considers this an error, and the connection is refused. We will see that this is exactly why the attack works. Otherwise, the new socket's TCP state is LISTEN, and the completion of the passive open is attempted. Drop if RST, ACK, or no SYN: If the segment contains a RST, it is dropped. If it contains an ACK, it is dropped, a RST is sent and the memory structures torn down (the ACK makes no sense for the connection at this point, and is considered an error). If the segment does not have the SYN bit on, it is dropped. If the segment contains a SYN, processing continues. Address processing, etc: TCP then gets the clients address information into a buffer and connects it's pcb{} to the client, processes any TCP options, and initializes it's initial send sequence (ISS) number. ACK the SYN: TCP sends a SYN, ISS and an ACK to the client. The connection establishment timer is set for 75 seconds at this point. The state changes to SYN_RCVD. Now. TCP is commited to the socket. We will see that this is state the target TCP will be in when in the throes of the attack because the expected client response is never received. The state remains SYN_RCVD until the connection establishment timer expires, in which case the all the memory structures associated with the connection are destroyed, and the socket returns to the LISTEN state. Section IV. The Attack A TCP connection is initiated with a client issuing a request to a server with the SYN flag on in the TCP header. Normally the server will issue a SYN/ACK back to the client identified by the 32-bit source address in the IP header. The client will then send an ACK to the server (as we saw in figure 1 above) and data transfer can commence. When the client IP address is spoofed to be that of an unreachable, host, however, the targetted TCP cannot complete the 3-way handshake and will keep trying until it times out. That is the basis for the attack. The attacking host sends a few (we saw that as little as 6 is enough) SYN requests to the target TCP port (for example, the telnet daemon). The attacking host also must make sure that the source IP-address is spoofed to be that of another, currently unreachable host (the target TCP will be sending it's response to this address). IP (by way of ICMP) will inform TCP that the host is unreachable, but TCP considers these errors to be transient and leaves the resolution of them up to IP (reroute the packets, etc) effectively ignoring them. The IP-address must be unreachable because the attacker does not want *any* host to recieve the SYN/ACKs that will be coming from the target TCP, which would elicit a RST from that host (as we saw in TCP input above). This would foil the attack. The process is as follows: fig(3) 1 Z(x) ---SYN---> A Z(x) ---SYN---> A Z(x) ---SYN---> A Z(x) ---SYN---> A Z(x) ---SYN---> A Z(x) ---SYN---> A 2 X <---SYN/ACK--- A X <---SYN/ACK--- A ... 3 X <---RST--- A At (1) the attacking host sends a multitude of SYN requests to the target to fill it's backlog queue with pending connections. (2) The target responds with SYN/ACKs to what it believes is the source of the incoming SYNs. During this time all further requests to this TCP port will be ignored. The target port is flooded. --[ Linux Anomaly ]-- In doing my research for this project, I noticed a very strange implementation error in the TCP module of Linux. When a particular TCP server is flooded on a Linux host, strange things are afoot... First, it appears that the connection-establishment timer is broken. The 10 spoofed connection-requests keep the sockets in the SYN_RCVD state for just over 20 minutes (23 minutesto be exact. Wonder what the signifigance of this is... Hmmm...). Much longer than the 75-seconds it *should* be. The next oddity is even more odd... After that seemingly arbitrary time period (I have to determine what the hell is going on there), TCP moves the flooded sockets into the CLOSE state, where they *stay* until a connection-request arrives on a *different* port. If a connection-request arrives on the