Mail client from scratch: Part 1 - Setting up

An introductory post about how IMAP works and what it means to connect to an IMAP server.

## Introduction

E-mail is genuinely a fascinating technology that has survived the test of time. It is old, messy, and complex, but it works, and whether you like it or not, e-mail is the backbone of all things business. Billions upon billions of e-mails are exchanged daily, all to ensure that your local Starbucks has all the necessary ingredients to make your favorite iced pecan caramel crunch soy milk latte in stock.

At Mandel AI, we deal with e-mail and lots of it. E-mail is basically my life now, and I wouldn't say I love it. Weirdly enough, the complexities that arise from working with e-mails sparked my interest in taking a closer look at what it would take to get to a working e-mail client from zero without any* libraries to help me out. Hopefully, this way, I’ll better understand why I hate e-mail so much.

Also, since I like being unnecessarily hard on myself, I'll be doing this entire project in a language I don't know and in an ecosystem I have no experience with - Gleam. For context, Gleam is a type-safe language built on top of BEAM, Erlang's virtual machine. I could rant about how Erlang has all these cool concepts, but I don't understand them well enough to even begin. Regardless, I want to try Gleam because I got nerd-sniped by a YouTube video, so there's that.

This project's scope is to create a website where I can add my e-mail accounts, view messages and their attachments in real-time, and reply to them. I'll have to implement many things along the way, starting from a basic IMAP library, a MIME parser, a threading algorithm, etc., all bundled together in a pretty little web page. I'll allow myself to cut some corners, as I don't want this to turn into a year-long project, but I'll do my best to stick to the relevant RFCs while preserving my sanity. I'll document the entire process in the form of a series of blog posts, hopefully serving as a somewhat friendly resource for understanding how e-mail works.

This blog post will likely be the least interesting of the series, as it’s mostly me wrestling with Gleam and discovering how IMAP works. Still, the topic will eventually shift toward more interesting technical stuff once we dive deeper into the details. Once I consider the project “complete,” I intend to synthesize the most critical learnings in a final blog post, which would probably be a more suitable read for those looking for answers rather than being interested in the story.

To see all blog posts related to this project, see #mail-client.

## What is IMAP anyway?

IMAP, or Internet Message Access Protocol, is the "modern" way of interfacing with inbound e-mail. It is a plaintext TCP-based protocol that allows users to access and manage their e-mails directly on the e-mail server, providing a more efficient and flexible way of handling e-mails than older protocols like POP. With IMAP, users can view, organize, and synchronize their e-mails across multiple devices, ensuring that changes made to e-mails are reflected consistently.

All IMAP sessions start with a greeting message from the server, which indicates that the server is ready to accept commands. From there, the IMAP process typically follows these steps:

Authentication: The client sends login credentials to the server using commands like LOGIN or AUTHENTICATE (we’ll talk more about those later).
Mailbox selection: Once authenticated, the client selects a mailbox to work with using the SELECT command.
Message retrieval: The client can fetch message headers, bodies, or specific parts using commands like FETCH.
Message management: Clients can perform various actions, such as moving messages between folders (COPY followed by delete), deleting messages (STORE +FLAGS \Deleted and EXPUNGE), or creating new folders (CREATE). IMAP doesn’t cover e-mail sending, hence why I called it an interface for inbound e-mail.
Synchronization: IMAP keeps the server and client in sync, reflecting changes made on one device across all others. This is achieved through commands like IDLE, which tell the server to push updates to the client as they come.
Disconnecting: When finished, the client can close the selected mailbox (CLOSE) and end the session (LOGOUT).

Here’s an example IMAP session showing login, mailbox selection, and fetching message data:

C: a001 LOGIN username password
S: a001 OK LOGIN completed
C: a002 SELECT INBOX
S: * 18 EXISTS
S: * FLAGS (\Answered \Flagged \Deleted \Seen \Draft)
S: a002 OK [READ-WRITE] SELECT completed
C: a003 FETCH 12 FULL
S: * 12 FETCH (FLAGS (\Seen) ...)
S: a003 OK FETCH completed

As you may have noticed, all IMAP commands look like this:

<tag> <command> [<arg1> <arg2> ...]

The tag is a unique identifier for each command, allowing the client to match responses to specific commands. This is about as much as you need to know about IMAP. Of course, there are more details, but we’ll get to them as we need them.

## Connecting to IMAP from Gleam

Connecting to IMAP is simple. All you need to do is make a TCP connection to the correct port (usually 143 or 993, depending on whether you use SSL), and you're in. For now, I'll assume we'll be using SSL, as most e-mail providers require it nowadays, but it'd be nice to support unencrypted IMAP connections for completeness eventually. Ironically, even this simple step is a bit of a chore in Gleam, as it has no standard SSL socket library. Thankfully, we can use Erlang's SSL library via Gleam's FFI.

Calling into Erlang code from Gleam is pretty straightforward, for the most part. You can write Erlang code next to your Gleam code, define a function in Gleam with the same signature, annotate it with @external, and you are done. For example, let’s make a file called ffi.erl in our Gleam project like so:

-module(ffi).
-export([hello/0]).

hello() ->
    io:fwrite("Hello, world!~n", []).

Then, in our main Gleam file, let’s define the corresponding @external function and call it:

@external(erlang, "ffi", "hello")
fn hello_ffi() -> Nil

pub fn main() {
  hello_ffi()
}

Now, if we run our project, we’ll see exactly what we expect:

$ gleam run
Downloading packages
 Downloaded 2 packages in 0.00s
  Compiling gleam_stdlib
  Compiling gleeunit
  Compiling invamail
   Compiled in 0.89s
    Running invamail.main
Hello, world!

To avoid turning this blog post into a full-on Gleam tutorial, I recommend looking at the Gleam tour and checking out this blog post by Jonas Hietala on Gleam FFI for further details. From now on, I’ll try to avoid such diversions unless they are relevant to the story. But since we’re just starting, and this is my first time writing Gleam code (and Erlang code, really), I’d say it’s as relevant as it gets.

Either way, to establish the initial connection to the IMAP server, we'll need three functions from ssl:

ssl:start - to start the SSL application¹
ssl:connect - to create the socket and establish a connection
ssl:recv - to receive data from the socket

All IMAP connections start with an initial greeting message from the server, so as the first step, we'll try retrieving said message and logging it to stdout, after which we’ll see if we can make sense of it.

Now comes the mildly annoying part: Erlang has no Result type, unlike Gleam. Instead, it is a typical pattern to return a tuple of the form {ok, Result} in case of success or {error, Reason} in case of an error. Thankfully, Gleam is smart enough to map tuples of this form to a Result value, but this doesn’t work when the success return value is just ok. Unfortunately for us, this is the case with ssl:start.

What this means for us is that we need to write extra boilerplate code to map the result to the tuple form that Gleam does understand. Initially, it may not seem too bad, but the annoyance stacks as you add more FFI functions. Still, Gleam’s FFI is one of the better ones I’ve seen, so I’ll try not to complain too much. Regardless, let’s ditch the hello function and make a wrapper for ssl:start:

-module(ffi).
-export([ssl_start/0]).

ssl_start() ->
    case ssl:start() of
        ok ->
            {ok, nil};
        {error, Reason} ->
            {error, Reason}
    end.

In Gleam terms, the signature of ssl_start looks like this:

import gleam/dynamic.{type Dynamic}

@external(erlang, "ffi", "ssl_start")
fn ssl_start_ffi() -> Result(Nil, Dynamic)

You may notice the Dynamic type used for the error value. Since we don’t know what the error could be and are too lazy to look it up, we’ll leave it as Dynamic for now. In practice, ssl:start would likely never fail either way, so it doesn’t make much of a difference from a practical standpoint.

Starting the SSL application doesn’t do anything that we can observe, so let’s proceed with ssl:connect:

-module(ffi).

-export([ssl_start/0, ssl_connect/2]).

ssl_start() ->
    case ssl:start() of
        ok ->
            {ok, nil};
        {error, Reason} ->
            {error, Reason}
    end.

ssl_connect(Host, Port) ->
    ssl:connect(Host,
                Port,
                [{verify, verify_peer},
                 {cacerts, public_key:cacerts_get()},
                 {active, false},
                 {mode, binary}]).

There are some things to unpack here, namely the third argument of ssl:connect, which is the options for our socket. Let’s go through what each of the options means:

{verify, verify_peer}: instructs the client to verify the server's certificate during the SSL handshake
{cacerts, public_key:cacerts_get()}: specifies the list of trusted root CA certificates to be the default system CA certificates
{active, false}: sets the socket to passive mode, which means we’ll be able to call ssl:recv on it²
{mode, binary}: sets the data transfer mode to binary, which is generally considered more efficient than the default list mode

Now, we can see what happens when we create a socket:

import gleam/dynamic.{type Dynamic}
import gleam/erlang/charlist.{type Charlist}
import gleam/io
import gleam/result.{then}

pub type SslSocket

@external(erlang, "ffi", "ssl_start")
fn ssl_start_ffi() -> Result(Nil, Dynamic)

fn ssl_start() {
  ssl_start_ffi()
}

@external(erlang, "ffi", "ssl_connect")
fn ssl_connect_ffi(host: Charlist, port: Int) -> Result(SslSocket, Dynamic)

fn ssl_connect(host: String, port: Int) {
  ssl_connect_ffi(charlist.from_string(host), port)
}

pub fn main() {
  let assert Ok(_) = ssl_start()
  let assert Ok(socket) =
    ssl_connect(
      // The IMAP server for my personal e-mail
      "mail.riseup.net",
      993,
    )

  io.debug(socket)
}

The code is not too different from what we had initially, except for the introduction of Charlist from the gleam_erlang library. It took me longer than I’d like to admit to figure this out but Charlist is what you’re supposed to use instead of String when interfacing with Erlang. I also introduced wrapper functions for both FFI functions. While it’s not strictly necessary for ssl_start_ffi, I generally like the idea of having wrapper functions for FFI code, and I’d prefer to be consistent in that regard.

When we run the project, we should see something like this:

$ gleam run
  Compiling invamail
   Compiled in 0.24s
    Running invamail.main
Sslsocket(GenTcp(//erl(#Port<0.5>), TlsConnection, Undefined), [//erl(<0.124.0>), //erl(<0.123.0>)])

Looks like we have a socket! Thankfully, ssl:recv is the most straightforward operation so far, as we can make an @external function for it directly without the need for boilerplate wrapper code:

// ...

@external(erlang, "ssl", "recv")
fn ssl_receive_ffi(
  socket: SslSocket,
  size: Int,
  timeout: Int,
) -> Result(BitArray, Dynamic)

fn ssl_receive(socket: SslSocket, size: Int, timeout_milliseconds timeout: Int) {
  ssl_receive_ffi(socket, size, timeout)
}

pub fn main() {
  let assert Ok(_) = ssl_start()
  let assert Ok(socket) =
    ssl_connect(
      // The IMAP server for my personal e-mail
      "imap.gmail.com",
      993,
    )

  let assert Ok(greeting) = ssl_receive(socket, 0, 1000)

  io.debug(greeting)
}

By passing 0 for the size, we effectively ask for all the available data. The timeout should be self-explanatory. Running the code, we see the following:

$ gleam run
  Compiling invamail
   Compiled in 0.25s
    Running invamail.main
"* OK [CAPABILITY IMAP4rev1 SASL-IR LOGIN-REFERRALS ID ENABLE IDLE LITERAL+ AUTH=PLAIN AUTH=LOGIN] howdy, ready.\r\n"

Great success! What does any of this mean, though?

## Understanding the IMAP greeting message

Let’s break down the greeting message we received into its components:

* OK: indicates a successful connection to the IMAP server
[CAPABILITY ...]: the capabilities supported by the server, in this case:
- IMAP4rev1: IMAP version supported by the server, in this case, version 4 revision 1
- SASL-IR: supports initial client response in SASL authentication
- LOGIN-REFERRALS: the server can provide login referrals
- ID: supports the ID extension for server/client identification
- ENABLE: allows enabling optional features
- IDLE: supports the IDLE command for push email
- LITERAL+: allows non-synchronizing literals
- AUTH=PLAIN: supports PLAIN authentication method
- AUTH=LOGIN: supports LOGIN authentication method
howdy, ready.: custom greeting message from the server
\r\‎n: carriage return and line feed characters, indicating the end of the response

If none of this makes sense to you, don’t worry; it’s not supposed to. The greeting message informs you of the IMAP server's capabilities. Not all servers have all IMAP capabilities, so a correct IMAP library would need to take those into consideration. As you’ll see, most of this information isn’t strictly relevant to the end goal of this project, as I do not intend to implement every single IMAP quirk, but some of it is worth taking into consideration:

For example, the supported IMAP version capability looks relevant at first glance. There is indeed a colossal difference in the supported commands across different IMAP versions, but the good news is IMAP4rev1 is far from new — it was published in 2003, yet it has all the functionality we’d need. Hence, it’s a safe assumption that most (if not all) IMAP servers in the wild support it. This is one of the good things about e-mail: it is so mature that you can dig up code from >15 years ago, and it would still be perfectly relevant, which is a concept that may be hard for JavaScript developers to swallow.

Then, there are the authentication-related capabilities. There are way too many ways to authenticate yourself in IMAP, but what’s important is that RFC 3501 states the following:

Client and server implementations MUST implement the STARTTLS, LOGINDISABLED, and AUTH=PLAIN (described in [IMAP-TLS]) capabilities. See the Security Considerations section for important information.

This means we can assume that AUTH=PLAIN will always be available, so we don’t have to worry about more complicated authentication methods. It’s important to mention that, as the name suggests, AUTH=PLAIN quite literally involves sending the password as plain text, but as long as we’re behind SSL, it’s no big deal. Websites do it all the time, so why couldn’t we? It would be a completely different story if I ever added support for non-SSL IMAP servers, but that sounds like a problem for future me.

Then there’s SASL-IR, an IMAP capability that is an excellent example of IMAP’s age. It stands for “Simple Authentication and Security Layer Initial Response,” which is a very cryptic way of saying “you are allowed to send your credentials as part of the authentication request.” I won’t go into too much detail, but AUTH=PLAIN didn’t exist back in the olden times. Before that, authentication methods except for LOGIN (which is also plain text) involved a back-and-forth between the client and the server, where the server would provide a “challenge,” and the client needed to “solve” it to log in. AUTH=PLAIN does not have a challenge step, but it still needed to happen in two separate round trips due to how the AUTHENTICATE command worked, so SASL-IR was added as a means to avoid the unnecessary second round trip.

It’s definitely worth mentioning that the IMAP spec does not require servers to specify their capabilities in the greeting message. There is a separate CAPABILITY IMAP command, which is the proper way for an IMAP client to retrieve the server’s capabilities. I guess it’s a nice coincidence that my e-mail provider of choice’s IMAP server included them in the greeting, as it allowed me to talk about what’s next. For instance, Gmail’s IMAP server’s greeting message looks nothing like the one shown above:

$ openssl s_client -crlf -connect imap.gmail.com:993 -quiet
Connecting to 108.177.96.108
depth=2 C=US, O=Google Trust Services LLC, CN=GTS Root R1
verify return:1
depth=1 C=US, O=Google Trust Services, CN=WR2
verify return:1
depth=0 CN=imap.gmail.com
verify return:1
* OK Gimap ready for requests from xxx.xxx.xxx.xxx in7mb8112347edb
a001 CAPABILITY
* CAPABILITY IMAP4rev1 UNSELECT IDLE NAMESPACE QUOTA ID XLIST CHILDREN X-GM-EXT-1 XYZZY SASL-IR AUTH=XOAUTH2 AUTH=PLAIN AUTH=PLAIN-CLIENTTOKEN AUTH=OAUTHBEARER
a001 OK Thats all she wrote! in7mb8112347edb

Also, some IMAP servers tell you their capabilities after you authenticate without you having to ask for them, which shows how much thought has gone into optimizing the IMAP protocol over the years:

A server MAY include a CAPABILITY response code in the tagged OK response of a successful AUTHENTICATE command in order to send capabilities automatically. It is unnecessary for a client to send a separate CAPABILITY command if it recognizes these automatic capabilities.

I wouldn’t concern myself with such optimizations in the early stages of the project, as I want to keep it as simple as possible, but it’s worth keeping in mind regardless.

## Conclusion & next steps

While I haven’t precisely achieved much in the span of this blog post, I believe it’s a good primer for what’s about to come. Publishing this also makes it more likely that I won’t ditch this project halfway, so it’s more of an accountability thing than anything else. Still, I hope I got some of you invested in the story.

In the next blog post, I intend to design the base of the IMAP library in a way that enables me to extend it quickly as the project evolves, so expect lots of parsing and serialization. I also intend to open-source the project, which I haven’t done yet, as it doesn’t exactly have much to show for itself yet.

Applications in Erlang are components that can be started and stopped as a unit, as well as reused in other systems. For more details, read the Erlang documentation.

To elaborate a bit on what the difference between active and passive mode is, in active mode, the socket automatically sends messages to the controlling process when data arrives, and in passive mode, the user must explicitly request data from the socket. The latter makes more sense for our use case (at least for now), as it’d keep our control flow simple and clear.