I have started writing an implementation of BGP in Odin. Odin can be thought of as C, but with some of the pain points addressed. For instance, C does not have strings as a first-class type, nor does it have slices or dynamic arrays as a built-in type. Slices end up being particularly useful when it comes to parsing binary formats.
The goal of this project is to implement a fully-functional BGP router according to RFC 4271. Practically speaking, this means that my program will be able to start a TCP session with a Cisco IOS router, exchange OPEN messages, exchange UPDATE messages, and maintain the session. To keep things reasonably simple I am aiming to support the ipv4 unicast address family for the time being.
The first order of business is actually parsing the BGP messages and testing the parser implementation. My initial test corpus is a set of Wireshark captures from working BGP sessions. I used containerlab to setup 2 Cisco IOS nodes, each configured to eBGP peer with each other, and then used Wireshark to dump this capture containing all the messages the IOS devices used to reach the ESTABLISHED state. I used the packet captures in conjunction with RFC 4271 in order to ensure the parser supports core RFC 4271 along with common modern additions.
What’s in a BGP Session? #
The first step I took in the process of making the parser was to try to identify common patterns in the protocol to inform the design of the parser. I also wanted to compare what I see in the packet capture to what the RFC dictates and create a mental mapping between the two. RFC 4271 defines the following messages:
- OPEN
- UPDATE
- KEEPALIVE
- NOTIFICATION
All BGP messages also include a 19 byte header consisting of a Marker,
Length, and Type.
-
Markeris only included for compatibility -
Lengthis 2 bytes indicating the total message length. -
Typeis 1 byte indicating the kind of message.
Refer to https://datatracker.ietf.org/doc/html/rfc4271#section-4.1 for more info
At a high level, two routers form a BGP relationship by first establishing a TCP connection over port 179. Next they exchange OPEN messages containing each routers capabilities, ASN, and BGP ID. If their capabilities match, they send KEEPALIVES to each other. Receipt of a KEEPALIVE from the remote peer transitions the BGP FSM into the Established state. Once established, the BGP peers send UPDATES containing the path attributes and prefixes they are advertising. To maintain the session, KEEPALIVEs are exchanged periodically.
This capture shows this process in action. If you
want to follow along, I recommend using the bgp filter in Wireshark so we can
focus only on the BGP messages being exchanged between 192.168.12.1 (R1) and
192.168.12.2 (R2). The number on the far left is the packet number, which I
will use to reference specific packets.
43 192.168.12.1 192.168.12.2 BGP OPEN Message
45 192.168.12.2 192.168.12.1 BGP OPEN Message
46 192.168.12.2 192.168.12.1 BGP KEEPALIVE Message
48 192.168.12.1 192.168.12.2 BGP KEEPALIVE Message
49 192.168.12.2 192.168.12.1 BGP KEEPALIVE Message
50 192.168.12.2 192.168.12.1 BGP UPDATE Message, UPDATE Message
51 192.168.12.1 192.168.12.2 BGP KEEPALIVE Message
52 192.168.12.1 192.168.12.2 BGP UPDATE Message, UPDATE Message
66 192.168.12.2 192.168.12.1 BGP KEEPALIVE Message
71 192.168.12.1 192.168.12.2 BGP KEEPALIVE Message
82 192.168.12.2 192.168.12.1 BGP KEEPALIVE Message
87 192.168.12.1 192.168.12.2 BGP KEEPALIVE Message
100 192.168.12.2 192.168.12.1 BGP KEEPALIVE Message
105 192.168.12.1 192.168.12.2 BGP KEEPALIVE Message
OPEN #
Packet 43 is R1 initiating the peer relationship with R2. The OPEN message
contains the BGP Version, the Autonomous System Number(ASN) of the sender,
the Hold Time, the BGP Identifier of the sender, and optional capabilities
such as 4-byte ASNs or support for specific address families. Here is packet 43:
Border Gateway Protocol - OPEN Message
Marker: ffffffffffffffffffffffffffffffff
Length: 57
Type: OPEN Message (1)
Version: 4
My AS: 1
Hold Time: 180
BGP Identifier: 1.1.1.1
Optional Parameters Length: 28
Optional Parameters
Optional Parameter: Capability
Parameter Type: Capability (2)
Parameter Length: 6
Capability: Multiprotocol extensions capability
Type: Multiprotocol extensions capability (1)
Length: 4
AFI: IPv4 (1)
Reserved: 00
SAFI: Unicast (1)
Optional Parameter: Capability
Parameter Type: Capability (2)
Parameter Length: 2
Capability: Route Refresh Capability (Cisco)
Type: Route Refresh Capability (Cisco) (128)
Length: 0
Optional Parameter: Capability
Parameter Type: Capability (2)
Parameter Length: 2
Capability: Route refresh capability
Type: Route refresh capability (2)
Length: 0
Optional Parameter: Capability
Parameter Type: Capability (2)
Parameter Length: 2
Capability: Enhanced route refresh capability
Type: Enhanced route refresh capability (70)
Length: 0
Optional Parameter: Capability
Parameter Type: Capability (2)
Parameter Length: 6
Capability: Support for 4-octet AS number capability
Type: Support for 4-octet AS number capability (65)
Length: 4
AS Number: 1
The output is mostly unsurprising and matches the configuration on the device. This OPEN message indicates R1 is using BGP Version 4, is in AS1, has a Hold Time of 180 seconds, and is using its loopback address of 1.1.1.1 to uniquely identify itself. It also included some Optional Parameters, but before digging into that I want to compare what is seen in Wireshark with what the RFC says.
RFC 4271 defines the OPEN message as:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+
| Version |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| My Autonomous System |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Hold Time |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| BGP Identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opt Parm Len |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Optional Parameters (variable) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Source: https://datatracker.ietf.org/doc/html/rfc4271#section-4.2
What we see in Wireshark matches what RFC 4271 defines. However, RFC 4271 does not specify anything about the Capability Advertisement. RFC 3392 defines the Capabilities Advertisement shown here. According to RFC 3392, each Capability is encoded as a TLV (Type-Length-Value) tuple:
+------------------------------+
| Capability Code (1 octet) |
+------------------------------+
| Capability Length (1 octet) |
+------------------------------+
| Capability Value (variable) |
+------------------------------+
Source: https://datatracker.ietf.org/doc/html/rfc3392#section-4
The capability advertisements we see in Wireshark do match RFC 3392. My Cisco IOS-XE 17.12.1 nodes by default advertise support for 4-byte ASNs and multiple different Route Refresh flavors. Capability Advertisements also indicate which Address Family (AFI) and Subaddress Family (SAFI) it wants to support in this session, which in this case is the IPv4 Address Family and the Unicast Subaddress Family.
The parser will need to remember some of these Capabilities in order to properly interpret UPDATE messages. For example, the presence of the 4-byte ASN capability means that instead of the legacy 2-byte ASNs, the UPDATE messages received from this peer will contain 4-byte ASNs.
In packet 45 we see R2 respond with an OPEN message containings its own AS, hold timer, identifier, matching route refresh capabilities, and matching AFI/SAFI support. R1 and R2 indicate mutual support for all advertised capabilities by sending KEEPALIVE messages to each other. After each router sees the KEEPALIVE from the new peer, they can move into the Established state.
KEEPALIVE #
KEEPALIVE messages consist only of the 19-byte header and don’t contain any variable length data, making them trivial to parse.
Border Gateway Protocol - KEEPALIVE Message
Marker: ffffffffffffffffffffffffffffffff
Length: 19
Type: KEEPALIVE Message (4)
NOTIFICATION #
NOTIFICATION messages are used to notify peers about errors. To create this
packet capture, I administratively reset the BGP session between R1 and R2 with
clear ip bgp * which generated the Cease notification seen in Packet 35:
Border Gateway Protocol - NOTIFICATION Message
Marker: ffffffffffffffffffffffffffffffff
Length: 21
Type: NOTIFICATION Message (3)
Major error Code: Cease (6)
Minor error Code (Cease): Administratively Reset (4)
Here is the RFC 4271 definition of a NOTIFICATION:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Error code | Error subcode | Data (variable) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
One of the issues I aimed to avoid by referring only to a packet capture was
handling only the cases I happened to capture. This particular NOTIFICATION
message doesn’t have any variable length data, and also does not have an
explicit length field to imply that there even could be. To compute the length
of the Data RFC 4271 provides the formula Message Len = 21 + Data Len,
which when re-arranged gives Data Len = Message Len - 21. Additionally, the
presence of additional Data can be inferred by checking if the NOTIFICATION
message length is greater than 21.
UPDATE #
Since the BGP session was established, the routers now exchange UPDATE messages. UPDATE messages contain withdrawn routes, path attributes, and network layer reachability information (NLRI).
Border Gateway Protocol - UPDATE Message
Marker: ffffffffffffffffffffffffffffffff
Length: 54
Type: UPDATE Message (2)
Withdrawn Routes Length: 0
Total Path Attribute Length: 27
Path attributes
Path Attribute - ORIGIN: IGP
Flags: 0x40, Transitive, Well-known, Complete
0... .... = Optional: Not set
.1.. .... = Transitive: Set
..0. .... = Partial: Not set
...0 .... = Extended-Length: Not set
.... 0000 = Unused: 0x0
Type Code: ORIGIN (1)
Length: 1
Origin: IGP (0)
Path Attribute - AS_PATH: 2
Flags: 0x40, Transitive, Well-known, Complete
0... .... = Optional: Not set
.1.. .... = Transitive: Set
..0. .... = Partial: Not set
...0 .... = Extended-Length: Not set
.... 0000 = Unused: 0x0
Type Code: AS_PATH (2)
Length: 6
AS Path segment: 2
Segment type: AS_SEQUENCE (2)
Segment length (number of ASN): 1
AS4: 2
Path Attribute - NEXT_HOP: 192.168.12.2
Flags: 0x40, Transitive, Well-known, Complete
0... .... = Optional: Not set
.1.. .... = Transitive: Set
..0. .... = Partial: Not set
...0 .... = Extended-Length: Not set
.... 0000 = Unused: 0x0
Type Code: NEXT_HOP (3)
Length: 4
Next hop: 192.168.12.2
Path Attribute - MULTI_EXIT_DISC: 0
Flags: 0x80, Optional, Non-transitive, Complete
1... .... = Optional: Set
.0.. .... = Transitive: Not set
..0. .... = Partial: Not set
...0 .... = Extended-Length: Not set
.... 0000 = Unused: 0x0
Type Code: MULTI_EXIT_DISC (4)
Length: 4
Multiple exit discriminator: 0
Network Layer Reachability Information (NLRI)
2.2.2.0/24
NLRI prefix length: 24
NLRI prefix: 2.2.2.0
Withdrawn Routes Length is 0, indicating there are no withdrawn routes. Next,
Total Path Attribute Length indicates the next 27 bytes of the message contain
BGP Path Attributes. Each path attribute is encoded with flags, type,
length and the attribute value. Parsing the majority of these attributes is
straightforward, it’s possible to just check the type of the attribute and then
pull out however many bytes that field
must be. For example, LOCAL_PREF, MED, and NEXT_HOP are always 4 bytes, so no
external context is required to properly parse these path attributes. One
important exception to this is the AS_PATH attribute, which can contain a list
of either 4-byte ASNs or 2-byte ASNs depending on whether the 4-byte ASN
capability was exchanged in an OPEN message during session establishment. The
UPDATE message itself does not contain this information, so the parser needs to
store session-specific context to properly parse this attribute.
After path attributes comes the NLRI, which are the network prefixes
being advertised. In this particular UPDATE we can see the 2.2.2.0/24 prefix
being advertised. The way network prefixes are encoded in BGP messages is
pretty interesting to me, and it shows how BGP was designed with scale in mind.
Here is how 2.2.2.0/24 is encoded on the wire:
0x18 0x02 0x02 0x02
0x18 is 24 in hex, the prefix length in bits. The next 3 bytes represent the
significant octets of the advertised prefix. Since a /24 has 3 significant
octets, only those are included in the UPDATE and the last octet is implicitly
zero.
In isolation this may seem like a small optimization, but these savings add up when you consider that the number of IPv4 prefixes in the global routing table is 1,038,646 as of November 30, 2025 (source: CIDR Report). If all of these routes were /24s, we’d save 1 byte per prefix, resulting in about 1MB saved when advertising the entire table. Scaled across all routers and peering sessions on the Internet, these savings become significant over time.
RFC 4271 defines the format of a BGP update as:
+-----------------------------------------------------+
| Withdrawn Routes Length (2 octets) |
+-----------------------------------------------------+
| Withdrawn Routes (variable) |
+-----------------------------------------------------+
| Total Path Attribute Length (2 octets) |
+-----------------------------------------------------+
| Path Attributes (variable) |
+-----------------------------------------------------+
| Network Layer Reachability Information (variable) |
+-----------------------------------------------------+
Source: https://datatracker.ietf.org/doc/html/rfc4271#section-4.3
Withdrawn Routes and Network Layer Reachability Information share the exact
same prefix encoding described above.
Path attributes are TLV encoded. The Attribute Type is actually a 2 byte field, where the high-order byte contains flags indicating whether the attribute is transitive, optional, has an extended length, or is well-known. The low-order byte encodes the length.
More detailed information about the encoding of path attributes can be found in https://datatracker.ietf.org/doc/html/rfc4271#section-4.3.
Implementing The Parser #
Now that I am familiar with the BGP message format and have identified edge cases, it’s time to actually start parsing the BGP messages! To load the pcapng I am using libpcap. Since BGP is an application layer protocol, getting to the BGP message requires parsing through the Ethernet header, IPv4 header, and TCP header first.
Sidenote: pcapng files store packet bytes in network order (Big Endian), so all my protocol structs also use the
bevariant of whatever integer type is needed.
Ethernet Header #
There are actually two expected variants of Ethernet header – IEEE 802.3 and
Ethernet II. The practical difference is that IEEE 802.3 treats the 2-byte value
after the Source and Destination MAC addresses as a length instead of an
Ethertype. If the Ethertype value is greather than 0x800, then the frame needs
to be treated as Ethernet II, otherwise it should be treated as IEEE 802.3. The
frames in my test captures are Ethernet II frames, and they set the Ethertype
field to 0x800 to indicate the payload is IPv4. I included the Ethertype values
for common protocols in the Ethertype enum below.
Here is how I am representing an Ethernet frame in the parser:
Ethertype :: enum (u16be) {
loopback = 0x9000,
ipv4 = 0x0800,
arp = 0x0806,
rarp = 0x8035,
ipv6 = 0x86dd,
dot1q = 0x8100,
lacp = 0x8809,
}
MAC :: distinct [6]u8
Ethernet_Header :: struct #packed {
dst_mac: MAC,
src_mac: MAC,
ethertype: Ethertype,
}
To parse the header, I simply reinterpret the bytes in pkt as an
Ethernet_Header:
parse_ethernet_header :: proc(pkt: []byte) -> ([]byte, Ethernet_Header) {
pkt, eth_header, err := read_struct(pkt, Ethernet_Header)
return pkt, eth_header
}
The read_struct function is used a lot throughout the parser, so I think its
worth explaining a little bit.
Parse_Error :: enum {
none,
type_larger_than_buffer,
invalid_partial_read,
}
// Re-interprets `size_of(T)` bytes of `pkt` as `T` and advances the cursor
// forward by `size_of(T)`. Returns a slice pointing to the beginning of unread
// bytes, T, and a Parse_Error.
read_struct :: proc "contextless" (pkt: []byte, $T: typeid) -> ([]byte, T, Parse_Error) {
if len(pkt) >= size_of(T) {
// This dereference results in a copy
s := (cast(^T)raw_data(pkt))^
return pkt[size_of(T):], s, .none
}
return pkt, {}, .type_larger_than_buffer
}
This helper function takes a slice of bytes containing the packet, and a generic
type parameter T. After checking that the copy won’t exceed the size of the
packet, it reinterprets the bytes in the packet as T and then shifts our
“window” into the packet bytes forward by the size_of(T) and returns. If the
type is larger than the number of bytes left in the packet, then it does nothing
and returns an error to indicate the problem.
By printing out the header and comparing the result with what we see in Wireshark, we can confirm we sucessfully parsed the Ethernet header:
/* Wireshark */
Ethernet II, Src: aa:bb:cc:00:02:10 (aa:bb:cc:00:02:10), Dst: aa:bb:cc:00:01:10 (aa:bb:cc:00:01:10)
Destination: aa:bb:cc:00:01:10 (aa:bb:cc:00:01:10)
Source: aa:bb:cc:00:02:10 (aa:bb:cc:00:02:10)
Type: IPv4 (0x0800)
[Stream index: 3]
/* Parser Output */
Ethernet_Header{
dst_mac = AA:BB:CC:0:1:10,
src_mac = AA:BB:CC:0:2:10,
ethertype = "ipv4"
}
IPv4 Header #
The IPv4 Header is a bit more interesting. The ancient RFC 791 (born in 1981) defines the format:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Source: https://datatracker.ietf.org/doc/html/rfc791#section-3.1
There are a few fields that are only 4 bits, but they are nicely aligned to 8
bit boundaries, so I decided to just represent them as 1 byte and use helper
functions to pull out a useful interpretation. A perfect example of this is the
version and IHL fields. Both are 4 bits and they are adjacent to each other,
so they are stored as a u8 and I have an ipv4_version() and
ipv4_header_length() function that uses bitmasks to extract the values.
IP_Protocol :: enum (u8) {
icmp = 0x01,
igmp = 0x02,
tcp = 0x06,
igp = 0x09,
udp = 0x11,
eigrp = 0x58,
ospf = 0x59,
vrrp = 0x70,
}
IPv4_Header :: struct #packed {
// The lower 4 bits are the header length
// The upper 4 bits are the version
// [version][header_length]
version_header_len: u8,
tos: DSCP,
// This includes the header and the data
total_len: u16be,
identification: u16be,
// the upper 3 bits are flags, the rest are the fragment offset
flags_fragoffset: u16be,
ttl: u8,
protocol: IP_Protocol,
checksum: u16be,
src: net.IP4_Address,
dst: net.IP4_Address,
}
#assert(size_of(IPv4_Header) == 20)
// The header length in bytes
ipv4_header_length :: proc(hdr: IPv4_Header) -> u8 {
return 4 * (hdr.version_header_len & 0x0f)
}
ipv4_version :: proc(hdr: IPv4_Header) -> u8 {
return hdr.version_header_len & 0xf0
}
Similar to how the Ethernet header was parsed, we can use read_struct again to
just reinterpret the bytes:
parse_ipv4 :: proc(pkt: []byte) -> ([]byte, IPv4_Header) {
pkt, ipv4, err := read_struct(pkt, IPv4_Header)
return pkt, ipv4
}
Comparing the output of the parser with the output of Wireshark shows matching values:
/* Wireshark */
Internet Protocol Version 4, Src: 192.168.12.1, Dst: 192.168.12.2
0100 .... = Version: 4
.... 0101 = Header Length: 20 bytes (5)
Differentiated Services Field: 0xc0 (DSCP: CS6, ECN: Not-ECT)
1100 00.. = Differentiated Services Codepoint: Class Selector 6 (48)
.... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
Total Length: 40
Identification: 0x8c9d (35997)
010. .... = Flags: 0x2, Don't fragment
...0 0000 0000 0000 = Fragment Offset: 0
Time to Live: 1
Protocol: TCP (6)
Header Checksum: 0x531f [validation disabled]
[Header checksum status: Unverified]
Source Address: 192.168.12.1
Destination Address: 192.168.12.2
[Stream index: 0]
/* Parser Output */
IPv4_Header{
version_header_len = 69,
tos = %!(BAD ENUM VALUE=192),
total_len = 40,
identification = 35997,
flags_fragoffset = 16384,
ttl = 1,
protocol = "tcp",
checksum = 21279,
src = 192.168.12.1,
dst = 192.168.12.2
}
TCP Header #
The payload of the IPv4 header is a TCP packet. RFC 9293 defines the TCP header format as follows:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |C|E|U|A|P|R|S|F| |
| Offset| Rsrvd |W|C|R|C|S|S|Y|I| Window |
| | |R|E|G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| [Options] |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :
: Data :
: |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Source https://datatracker.ietf.org/doc/html/rfc9293#section-3.1
Mapping that to an Odin struct definition yields:
TCP_Flag :: enum (u8) {
congestion_window_reduced,
ecn_echo,
urgent,
ack,
push,
reset,
syn,
fin,
}
TCP_Flags :: bit_set[TCP_Flag]
// FIXME: does not currently support TCP Options
TCP :: struct #packed {
src_port: u16be,
dst_port: u16be,
seq_num: u32be,
ack_num: u32be,
// 0 3 4 7 8 15
// [data_offset] [reserved] [flags]
data_offset_rsv: u8,
flags: TCP_Flags,
window: u16be,
checksum: u16be,
urgent_ptr: u16be,
}
#assert(size_of(TCP) == 20)
Similar to the IPv4 header, there are fields that are less than 1 byte, so I represented them as 1 byte and used helper functions to extract the correct interpretation. This is definitely not a complete implementation of TCP, but this handles the most basic and common case for now.
Once I have the packet_slice, you have to get through the layer 2, layer 3,
and layer 4 headers to reach the TCP payload and call
parse_bgp_header(tcp_payload). First lets look how to represent the BGP header
as a struct:
BGP_Message_Type :: enum (u8) {
invalid = 0,
open = 1,
update = 2,
notification = 3,
keepalive = 4,
}
BGP_Header :: struct #packed {
marker: [16]byte `fmt:"-"`,
length: u16be,
type: BGP_Message_Type,
}
This defines a structure called BGP_Header containing an array of 16 bytes for
the marker, the length as a Big Endian 2-byte unsigned integer, and the type of
message. The message type is defined as an enum backed by 1 byte. The struct is
defined as #packed because by default, the compiler would insert padding to
align fields on 8-byte boundaries, which is more efficient for the CPU to
access. #packed turns off this behavior, and ensures the size and layout of
the struct exactly matches the format we see on the wire. This is desirable
because it keeps the representations 1:1, meaning parsing the header is as
simple as a memcpy.
// its possible that this is not actually a BGP msg, if it is not then return
// the un-modified input `pkt` and return false
parse_bgp_header :: proc(pkt: []byte) -> ([]byte, BGP_Header, bool) {
bgp_header := (transmute(^BGP_Header)raw_data(pkt))^
if bgp_header.marker == BGP_MARKER {
return pkt[size_of(BGP_Header):], bgp_header, true
} else {
return pkt, {}, true
}
}
Instead of going byte by byte, this is all that is needed to “parse” the header:
bgp_header := (transmute(^BGP_Header)raw_data(pkt))^
This line takes a pointer to the beginning of the packet with raw_data, then
transmutes it into a pointer to a BGP_Header, then finally dereferences that
pointer, effectively copying 19 bytes into the bgp_header variable. Next, to
sanity check the input, I check if the marker bytes are equal to 16 bytes of
0xff to confirm if these bytes actually are a BGP header or not. A common
“idiom” I use throughout the codebase is “eating” the bytes the parser just
parsed with return pkt[size_of(thing):], which just advances a pointer to the
beginning of un-parsed data.