Developing A BGP Speaker - Part 1

Table of Contents

I have started writing an implementation of BGP in Odin. Odin can be thought of as C, but with some of the pain points addressed. For instance, C does not have strings as a first-class type, nor does it have slices or dynamic arrays as a built-in type. Slices end up being particularly useful when it comes to parsing binary formats.

nice_sprite / bgpimpl

The goal of this project is to implement a fully-functional BGP router according to RFC 4271. Practically speaking, this means that my program will be able to start a TCP session with a Cisco IOS router, exchange OPEN messages, exchange UPDATE messages, and maintain the session. To keep things reasonably simple I am aiming to support the ipv4 unicast address family for the time being.

The first order of business is actually parsing the BGP messages and testing the parser implementation. My initial test corpus is a set of Wireshark captures from working BGP sessions. I used containerlab to setup 2 Cisco IOS nodes, each configured to eBGP peer with each other, and then used Wireshark to dump this capture containing all the messages the IOS devices used to reach the ESTABLISHED state. I used the packet captures in conjunction with RFC 4271 in order to ensure the parser supports core RFC 4271 along with common modern additions.

What’s in a BGP Session?
#

The first step I took in the process of making the parser was to try to identify common patterns in the protocol to inform the design of the parser. I also wanted to compare what I see in the packet capture to what the RFC dictates and create a mental mapping between the two. RFC 4271 defines the following messages:

OPEN
UPDATE
KEEPALIVE
NOTIFICATION

All BGP messages also include a 19 byte header consisting of a Marker, Length, and Type.

Marker is only included for compatibility
Length is 2 bytes indicating the total message length.
Type is 1 byte indicating the kind of message.

Refer to https://datatracker.ietf.org/doc/html/rfc4271#section-4.1 for more info

At a high level, two routers form a BGP relationship by first establishing a TCP connection over port 179. Next they exchange OPEN messages containing each routers capabilities, ASN, and BGP ID. If their capabilities match, they send KEEPALIVES to each other. Receipt of a KEEPALIVE from the remote peer transitions the BGP FSM into the Established state. Once established, the BGP peers send UPDATES containing the path attributes and prefixes they are advertising. To maintain the session, KEEPALIVEs are exchanged periodically.

This capture shows this process in action. If you want to follow along, I recommend using the bgp filter in Wireshark so we can focus only on the BGP messages being exchanged between 192.168.12.1 (R1) and 192.168.12.2 (R2). The number on the far left is the packet number, which I will use to reference specific packets.

43	 192.168.12.1	192.168.12.2	BGP	OPEN Message
45	 192.168.12.2	192.168.12.1	BGP	OPEN Message
46	 192.168.12.2	192.168.12.1	BGP	KEEPALIVE Message
48	 192.168.12.1	192.168.12.2	BGP	KEEPALIVE Message
49	 192.168.12.2	192.168.12.1	BGP	KEEPALIVE Message
50	 192.168.12.2	192.168.12.1	BGP	UPDATE Message, UPDATE Message
51	 192.168.12.1	192.168.12.2	BGP	KEEPALIVE Message
52	 192.168.12.1	192.168.12.2	BGP	UPDATE Message, UPDATE Message
66	 192.168.12.2	192.168.12.1	BGP	KEEPALIVE Message
71	 192.168.12.1	192.168.12.2	BGP	KEEPALIVE Message
82	 192.168.12.2	192.168.12.1	BGP	KEEPALIVE Message
87	 192.168.12.1	192.168.12.2	BGP	KEEPALIVE Message
100	 192.168.12.2	192.168.12.1	BGP	KEEPALIVE Message
105	 192.168.12.1	192.168.12.2	BGP	KEEPALIVE Message

OPEN
#

Packet 43 is R1 initiating the peer relationship with R2. The OPEN message contains the BGP Version, the Autonomous System Number(ASN) of the sender, the Hold Time, the BGP Identifier of the sender, and optional capabilities such as 4-byte ASNs or support for specific address families. Here is packet 43:

Border Gateway Protocol - OPEN Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 57
    Type: OPEN Message (1)
    Version: 4
    My AS: 1
    Hold Time: 180
    BGP Identifier: 1.1.1.1
    Optional Parameters Length: 28
    Optional Parameters
        Optional Parameter: Capability
            Parameter Type: Capability (2)
            Parameter Length: 6
            Capability: Multiprotocol extensions capability
                Type: Multiprotocol extensions capability (1)
                Length: 4
                AFI: IPv4 (1)
                Reserved: 00
                SAFI: Unicast (1)
        Optional Parameter: Capability
            Parameter Type: Capability (2)
            Parameter Length: 2
            Capability: Route Refresh Capability (Cisco)
                Type: Route Refresh Capability (Cisco) (128)
                Length: 0
        Optional Parameter: Capability
            Parameter Type: Capability (2)
            Parameter Length: 2
            Capability: Route refresh capability
                Type: Route refresh capability (2)
                Length: 0
        Optional Parameter: Capability
            Parameter Type: Capability (2)
            Parameter Length: 2
            Capability: Enhanced route refresh capability
                Type: Enhanced route refresh capability (70)
                Length: 0
        Optional Parameter: Capability
            Parameter Type: Capability (2)
            Parameter Length: 6
            Capability: Support for 4-octet AS number capability
                Type: Support for 4-octet AS number capability (65)
                Length: 4
                AS Number: 1

The output is mostly unsurprising and matches the configuration on the device. This OPEN message indicates R1 is using BGP Version 4, is in AS1, has a Hold Time of 180 seconds, and is using its loopback address of 1.1.1.1 to uniquely identify itself. It also included some Optional Parameters, but before digging into that I want to compare what is seen in Wireshark with what the RFC says.

RFC 4271 defines the OPEN message as:

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+
|    Version    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     My Autonomous System      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Hold Time           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         BGP Identifier                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opt Parm Len  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
|             Optional Parameters (variable)                    |
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Source: https://datatracker.ietf.org/doc/html/rfc4271#section-4.2

What we see in Wireshark matches what RFC 4271 defines. However, RFC 4271 does not specify anything about the Capability Advertisement. RFC 3392 defines the Capabilities Advertisement shown here. According to RFC 3392, each Capability is encoded as a TLV (Type-Length-Value) tuple:

+------------------------------+
| Capability Code (1 octet)    |
+------------------------------+
| Capability Length (1 octet)  |
+------------------------------+
| Capability Value (variable)  |
+------------------------------+

Source: https://datatracker.ietf.org/doc/html/rfc3392#section-4

The capability advertisements we see in Wireshark do match RFC 3392. My Cisco IOS-XE 17.12.1 nodes by default advertise support for 4-byte ASNs and multiple different Route Refresh flavors. Capability Advertisements also indicate which Address Family (AFI) and Subaddress Family (SAFI) it wants to support in this session, which in this case is the IPv4 Address Family and the Unicast Subaddress Family.

The parser will need to remember some of these Capabilities in order to properly interpret UPDATE messages. For example, the presence of the 4-byte ASN capability means that instead of the legacy 2-byte ASNs, the UPDATE messages received from this peer will contain 4-byte ASNs.

In packet 45 we see R2 respond with an OPEN message containings its own AS, hold timer, identifier, matching route refresh capabilities, and matching AFI/SAFI support. R1 and R2 indicate mutual support for all advertised capabilities by sending KEEPALIVE messages to each other. After each router sees the KEEPALIVE from the new peer, they can move into the Established state.

KEEPALIVE
#

KEEPALIVE messages consist only of the 19-byte header and don’t contain any variable length data, making them trivial to parse.

Border Gateway Protocol - KEEPALIVE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 19
    Type: KEEPALIVE Message (4)

NOTIFICATION
#

NOTIFICATION messages are used to notify peers about errors. To create this packet capture, I administratively reset the BGP session between R1 and R2 with clear ip bgp * which generated the Cease notification seen in Packet 35:

Border Gateway Protocol - NOTIFICATION Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 21
    Type: NOTIFICATION Message (3)
    Major error Code: Cease (6)
    Minor error Code (Cease): Administratively Reset (4)

Here is the RFC 4271 definition of a NOTIFICATION:

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Error code    | Error subcode |   Data (variable)             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

One of the issues I aimed to avoid by referring only to a packet capture was handling only the cases I happened to capture. This particular NOTIFICATION message doesn’t have any variable length data, and also does not have an explicit length field to imply that there even could be. To compute the length of the Data RFC 4271 provides the formula Message Len = 21 + Data Len, which when re-arranged gives Data Len = Message Len - 21. Additionally, the presence of additional Data can be inferred by checking if the NOTIFICATION message length is greater than 21.

UPDATE
#

Since the BGP session was established, the routers now exchange UPDATE messages. UPDATE messages contain withdrawn routes, path attributes, and network layer reachability information (NLRI).

Border Gateway Protocol - UPDATE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 54
    Type: UPDATE Message (2)
    Withdrawn Routes Length: 0
    Total Path Attribute Length: 27
    Path attributes
        Path Attribute - ORIGIN: IGP
            Flags: 0x40, Transitive, Well-known, Complete
                0... .... = Optional: Not set
                .1.. .... = Transitive: Set
                ..0. .... = Partial: Not set
                ...0 .... = Extended-Length: Not set
                .... 0000 = Unused: 0x0
            Type Code: ORIGIN (1)
            Length: 1
            Origin: IGP (0)
        Path Attribute - AS_PATH: 2 
            Flags: 0x40, Transitive, Well-known, Complete
                0... .... = Optional: Not set
                .1.. .... = Transitive: Set
                ..0. .... = Partial: Not set
                ...0 .... = Extended-Length: Not set
                .... 0000 = Unused: 0x0
            Type Code: AS_PATH (2)
            Length: 6
            AS Path segment: 2
                Segment type: AS_SEQUENCE (2)
                Segment length (number of ASN): 1
                AS4: 2
        Path Attribute - NEXT_HOP: 192.168.12.2 
            Flags: 0x40, Transitive, Well-known, Complete
                0... .... = Optional: Not set
                .1.. .... = Transitive: Set
                ..0. .... = Partial: Not set
                ...0 .... = Extended-Length: Not set
                .... 0000 = Unused: 0x0
            Type Code: NEXT_HOP (3)
            Length: 4
            Next hop: 192.168.12.2
        Path Attribute - MULTI_EXIT_DISC: 0
            Flags: 0x80, Optional, Non-transitive, Complete
                1... .... = Optional: Set
                .0.. .... = Transitive: Not set
                ..0. .... = Partial: Not set
                ...0 .... = Extended-Length: Not set
                .... 0000 = Unused: 0x0
            Type Code: MULTI_EXIT_DISC (4)
            Length: 4
            Multiple exit discriminator: 0
    Network Layer Reachability Information (NLRI)
        2.2.2.0/24
            NLRI prefix length: 24
            NLRI prefix: 2.2.2.0

Withdrawn Routes Length is 0, indicating there are no withdrawn routes. Next, Total Path Attribute Length indicates the next 27 bytes of the message contain BGP Path Attributes. Each path attribute is encoded with flags, type, length and the attribute value. Parsing the majority of these attributes is straightforward, it’s possible to just check the type of the attribute and then pull out however many bytes that field must be. For example, LOCAL_PREF, MED, and NEXT_HOP are always 4 bytes, so no external context is required to properly parse these path attributes. One important exception to this is the AS_PATH attribute, which can contain a list of either 4-byte ASNs or 2-byte ASNs depending on whether the 4-byte ASN capability was exchanged in an OPEN message during session establishment. The UPDATE message itself does not contain this information, so the parser needs to store session-specific context to properly parse this attribute.

After path attributes comes the NLRI, which are the network prefixes being advertised. In this particular UPDATE we can see the 2.2.2.0/24 prefix being advertised. The way network prefixes are encoded in BGP messages is pretty interesting to me, and it shows how BGP was designed with scale in mind. Here is how 2.2.2.0/24 is encoded on the wire:

0x18 0x02 0x02 0x02

0x18 is 24 in hex, the prefix length in bits. The next 3 bytes represent the significant octets of the advertised prefix. Since a /24 has 3 significant octets, only those are included in the UPDATE and the last octet is implicitly zero.

In isolation this may seem like a small optimization, but these savings add up when you consider that the number of IPv4 prefixes in the global routing table is 1,038,646 as of November 30, 2025 (source: CIDR Report). If all of these routes were /24s, we’d save 1 byte per prefix, resulting in about 1MB saved when advertising the entire table. Scaled across all routers and peering sessions on the Internet, these savings become significant over time.

RFC 4271 defines the format of a BGP update as:

+-----------------------------------------------------+
|   Withdrawn Routes Length (2 octets)                |
+-----------------------------------------------------+
|   Withdrawn Routes (variable)                       |
+-----------------------------------------------------+
|   Total Path Attribute Length (2 octets)            |
+-----------------------------------------------------+
|   Path Attributes (variable)                        |
+-----------------------------------------------------+
|   Network Layer Reachability Information (variable) |
+-----------------------------------------------------+

Source: https://datatracker.ietf.org/doc/html/rfc4271#section-4.3

Withdrawn Routes and Network Layer Reachability Information share the exact same prefix encoding described above.

Path attributes are TLV encoded. The Attribute Type is actually a 2 byte field, where the high-order byte contains flags indicating whether the attribute is transitive, optional, has an extended length, or is well-known. The low-order byte encodes the length.

More detailed information about the encoding of path attributes can be found in https://datatracker.ietf.org/doc/html/rfc4271#section-4.3.

Implementing The Parser
#

Now that I am familiar with the BGP message format and have identified edge cases, it’s time to actually start parsing the BGP messages! To load the pcapng I am using libpcap. Since BGP is an application layer protocol, getting to the BGP message requires parsing through the Ethernet header, IPv4 header, and TCP header first.

Sidenote: pcapng files store packet bytes in network order (Big Endian), so all my protocol structs also use the be variant of whatever integer type is needed.

Ethernet Header
#

There are actually two expected variants of Ethernet header – IEEE 802.3 and Ethernet II. The practical difference is that IEEE 802.3 treats the 2-byte value after the Source and Destination MAC addresses as a length instead of an Ethertype. If the Ethertype value is greather than 0x800, then the frame needs to be treated as Ethernet II, otherwise it should be treated as IEEE 802.3. The frames in my test captures are Ethernet II frames, and they set the Ethertype field to 0x800 to indicate the payload is IPv4. I included the Ethertype values for common protocols in the Ethertype enum below.

Here is how I am representing an Ethernet frame in the parser:

Ethertype :: enum (u16be) {
	loopback = 0x9000,
	ipv4     = 0x0800,
	arp      = 0x0806,
	rarp     = 0x8035,
	ipv6     = 0x86dd,
	dot1q    = 0x8100,
	lacp     = 0x8809,
}

MAC :: distinct [6]u8

Ethernet_Header :: struct #packed {
	dst_mac:   MAC,
	src_mac:   MAC,
	ethertype: Ethertype,
}

To parse the header, I simply reinterpret the bytes in pkt as an Ethernet_Header:

parse_ethernet_header :: proc(pkt: []byte) -> ([]byte, Ethernet_Header) {
	pkt, eth_header, err := read_struct(pkt, Ethernet_Header)
	return pkt, eth_header
}

The read_struct function is used a lot throughout the parser, so I think its worth explaining a little bit.

Parse_Error :: enum {
	none,
	type_larger_than_buffer,
	invalid_partial_read,
}

// Re-interprets `size_of(T)` bytes of `pkt` as `T` and advances the cursor
// forward by `size_of(T)`. Returns a slice pointing to the beginning of unread
// bytes, T, and a Parse_Error.
read_struct :: proc "contextless" (pkt: []byte, $T: typeid) -> ([]byte, T, Parse_Error) {
	if len(pkt) >= size_of(T) {
        // This dereference results in a copy 
        s := (cast(^T)raw_data(pkt))^
		return pkt[size_of(T):], s, .none
	}
	return pkt, {}, .type_larger_than_buffer
}

This helper function takes a slice of bytes containing the packet, and a generic type parameter T. After checking that the copy won’t exceed the size of the packet, it reinterprets the bytes in the packet as T and then shifts our “window” into the packet bytes forward by the size_of(T) and returns. If the type is larger than the number of bytes left in the packet, then it does nothing and returns an error to indicate the problem.

By printing out the header and comparing the result with what we see in Wireshark, we can confirm we sucessfully parsed the Ethernet header:

/* Wireshark */
Ethernet II, Src: aa:bb:cc:00:02:10 (aa:bb:cc:00:02:10), Dst: aa:bb:cc:00:01:10 (aa:bb:cc:00:01:10)
    Destination: aa:bb:cc:00:01:10 (aa:bb:cc:00:01:10)
    Source: aa:bb:cc:00:02:10 (aa:bb:cc:00:02:10)
    Type: IPv4 (0x0800)
    [Stream index: 3]

/* Parser Output */
Ethernet_Header{
    dst_mac = AA:BB:CC:0:1:10, 
    src_mac = AA:BB:CC:0:2:10, 
    ethertype = "ipv4"
}

IPv4 Header
#

The IPv4 Header is a bit more interesting. The ancient RFC 791 (born in 1981) defines the format:

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version|  IHL  |Type of Service|          Total Length         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Identification        |Flags|      Fragment Offset    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Time to Live |    Protocol   |         Header Checksum       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       Source Address                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Destination Address                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Options                    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Source: https://datatracker.ietf.org/doc/html/rfc791#section-3.1

There are a few fields that are only 4 bits, but they are nicely aligned to 8 bit boundaries, so I decided to just represent them as 1 byte and use helper functions to pull out a useful interpretation. A perfect example of this is the version and IHL fields. Both are 4 bits and they are adjacent to each other, so they are stored as a u8 and I have an ipv4_version() and ipv4_header_length() function that uses bitmasks to extract the values.

IP_Protocol :: enum (u8) {
	icmp  = 0x01,
	igmp  = 0x02,
	tcp   = 0x06,
	igp   = 0x09,
	udp   = 0x11,
	eigrp = 0x58,
	ospf  = 0x59,
	vrrp  = 0x70,
}

IPv4_Header :: struct #packed {
	// The lower 4 bits are the header length
	// The upper 4 bits are the version
	// [version][header_length]
	version_header_len: u8,
	tos:                DSCP,
	// This includes the header and the data
	total_len:          u16be,
	identification:     u16be,
	// the upper 3 bits are flags, the rest are the fragment offset
	flags_fragoffset:   u16be,
	ttl:                u8,
	protocol:           IP_Protocol,
	checksum:           u16be,
	src:                net.IP4_Address,
	dst:                net.IP4_Address,
}
#assert(size_of(IPv4_Header) == 20)

// The header length in bytes
ipv4_header_length :: proc(hdr: IPv4_Header) -> u8 {
	return 4 * (hdr.version_header_len & 0x0f)
}

ipv4_version :: proc(hdr: IPv4_Header) -> u8 {
	return hdr.version_header_len & 0xf0
}

Similar to how the Ethernet header was parsed, we can use read_struct again to just reinterpret the bytes:

parse_ipv4 :: proc(pkt: []byte) -> ([]byte, IPv4_Header) {
	pkt, ipv4, err := read_struct(pkt, IPv4_Header)
	return pkt, ipv4
}

Comparing the output of the parser with the output of Wireshark shows matching values:


/* Wireshark */
Internet Protocol Version 4, Src: 192.168.12.1, Dst: 192.168.12.2
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0xc0 (DSCP: CS6, ECN: Not-ECT)
        1100 00.. = Differentiated Services Codepoint: Class Selector 6 (48)
        .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
    Total Length: 40
    Identification: 0x8c9d (35997)
    010. .... = Flags: 0x2, Don't fragment
    ...0 0000 0000 0000 = Fragment Offset: 0
    Time to Live: 1
    Protocol: TCP (6)
    Header Checksum: 0x531f [validation disabled]
    [Header checksum status: Unverified]
    Source Address: 192.168.12.1
    Destination Address: 192.168.12.2
    [Stream index: 0]



/* Parser Output */
IPv4_Header{
    version_header_len = 69,
    tos = %!(BAD ENUM VALUE=192),
    total_len = 40,
    identification = 35997,
    flags_fragoffset = 16384,
    ttl = 1,
    protocol = "tcp",
    checksum = 21279,
    src = 192.168.12.1,
    dst = 192.168.12.2
}

TCP Header
#

The payload of the IPv4 header is a TCP packet. RFC 9293 defines the TCP header format as follows:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Source Port          |       Destination Port        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        Sequence Number                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Acknowledgment Number                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Data |       |C|E|U|A|P|R|S|F|                               |
| Offset| Rsrvd |W|C|R|C|S|S|Y|I|            Window             |
|       |       |R|E|G|K|H|T|N|N|                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Checksum            |         Urgent Pointer        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           [Options]                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               :
:                             Data                              :
:                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Source https://datatracker.ietf.org/doc/html/rfc9293#section-3.1

Mapping that to an Odin struct definition yields:

TCP_Flag :: enum (u8) {
	congestion_window_reduced,
	ecn_echo,
	urgent,
	ack,
	push,
	reset,
	syn,
	fin,
}

TCP_Flags :: bit_set[TCP_Flag]

// FIXME: does not currently support TCP Options
TCP :: struct #packed {
	src_port:        u16be,
	dst_port:        u16be,
	seq_num:         u32be,
	ack_num:         u32be,
	// 0           3 4       7  8     15
	// [data_offset] [reserved] [flags]
	data_offset_rsv: u8,
	flags:           TCP_Flags,
	window:          u16be,
	checksum:        u16be,
	urgent_ptr:      u16be,
}
#assert(size_of(TCP) == 20)

Similar to the IPv4 header, there are fields that are less than 1 byte, so I represented them as 1 byte and used helper functions to extract the correct interpretation. This is definitely not a complete implementation of TCP, but this handles the most basic and common case for now.

Once I have the packet_slice, you have to get through the layer 2, layer 3, and layer 4 headers to reach the TCP payload and call parse_bgp_header(tcp_payload). First lets look how to represent the BGP header as a struct:

BGP_Message_Type :: enum (u8) {
    invalid      = 0,
    open         = 1,
    update       = 2,
    notification = 3,
    keepalive    = 4,
}

BGP_Header :: struct #packed {
    marker: [16]byte `fmt:"-"`,
    length: u16be,
    type:   BGP_Message_Type,
}

This defines a structure called BGP_Header containing an array of 16 bytes for the marker, the length as a Big Endian 2-byte unsigned integer, and the type of message. The message type is defined as an enum backed by 1 byte. The struct is defined as #packed because by default, the compiler would insert padding to align fields on 8-byte boundaries, which is more efficient for the CPU to access. #packed turns off this behavior, and ensures the size and layout of the struct exactly matches the format we see on the wire. This is desirable because it keeps the representations 1:1, meaning parsing the header is as simple as a memcpy.

// its possible that this is not actually a BGP msg, if it is not then return  
// the un-modified input `pkt` and return false
parse_bgp_header :: proc(pkt: []byte) -> ([]byte, BGP_Header, bool) {
	bgp_header := (transmute(^BGP_Header)raw_data(pkt))^
	if bgp_header.marker == BGP_MARKER {
		return pkt[size_of(BGP_Header):], bgp_header, true
	} else {
		return pkt, {}, true
	}
}

Instead of going byte by byte, this is all that is needed to “parse” the header:

bgp_header := (transmute(^BGP_Header)raw_data(pkt))^

This line takes a pointer to the beginning of the packet with raw_data, then transmutes it into a pointer to a BGP_Header, then finally dereferences that pointer, effectively copying 19 bytes into the bgp_header variable. Next, to sanity check the input, I check if the marker bytes are equal to 16 bytes of 0xff to confirm if these bytes actually are a BGP header or not. A common “idiom” I use throughout the codebase is “eating” the bytes the parser just parsed with return pkt[size_of(thing):], which just advances a pointer to the beginning of un-parsed data.

What’s in a BGP Session? #

OPEN #

KEEPALIVE #

NOTIFICATION #

UPDATE #

Implementing The Parser #

Ethernet Header #

IPv4 Header #

TCP Header #