An In-Depth Look at the Mail Classifier

Building the Mail Classifier - How we Automatically Sync Email to the RMS at Scale

We recently introduced a new feature for the Bipsync RMS (see release notes 164 here). It’s called the Mail Classifier, and it allows Bipsync to automatically ingest and classify incoming email within an external mailbox. 

Email is an essential communication tool for all our clients, but the way in which email fits into the research process might change from client to client. Thus it was important for us to develop a highly customizable tool, allowing each fund to tailor the email classification feature to their unique workflow.

We enable this, in part, through the application of Rules. Through the Bipsync Rules engine, clients can specify: which emails Bipsync automatically import; what to tag these emails with and which Bipsync user should be set as the author.

Emails can be routed in to Bipsync via the application of rules.

Connecting to a mailbox

The first step in using the Mail Classifier is to link a mailbox with Bipsync via our Setup app. A client must provide essential details about their mailbox so that Bipsync can access it over IMAP.

Configuring a mailbox

Once a mailbox has been configured, it is possible to switch over to a preview of the mailbox.

The preview shows the ten most recent emails received and any of these emails can be used to jump start the creation of a rule. So, for example, if one of the ten recent emails has come from a domain of interest (e.g. morganstanley.com) then a user can quickly create a rule which will act on every email sent from that domain. This is a neat shortcut, but we’ll look at rules in more detail in a little while.

Once a mailbox has been set up with the Mail Classifier enabled, the magic starts to happen.

By default, the Mail Classifier will fetch mail every ten seconds. For each mailbox, we store the ID of the last processed email to use as a starting point for the next fetch. This ensures that the classifier does not accidentally process the same email twice, which could lead to duplicate content in the system.

We impose a configurable limit on the number of emails that are processed in each request – by default this is 100. IMAP is slow, especially when attachments are involved, so this is user-experience and performance driven. The limit ensures that work is done in manageable batches and users aren’t left waiting for something to happen.

The mail fetch process

Once a mailbox has been configured the next step is to set up some rules. These decide which emails should be converted into notes, and the format of those notes. Each rule consists of conditions and actions.

One Does Not Simply Get Imported Into Bipsync

Once we have fetched a set of emails we need to process them.

Taking the set of rules that have been configured for that mailbox and all the fetched emails, we apply one rule at a time, to each email in the list.

An email matches a rule if it passes all the rule’s conditions. Conditions are set up through an intuitive, additive interface, making it possible to create very targeted rules. We were careful to design the interface and our code in such a way as to make it easy to introduce additional condition types in future.

The condition types we have included are relatively simple. For example, if we wish to check if an email has come from a specific domain it’s a matter of taking the ‘from’ field out of the email and then comparing the string after the @ symbol with the domain name(s) specified for that condition in the setup app. If there are any matches the Mail Classifier knows that the email does come from one of the specified domains. This process is repeated for each of the conditions enabled on that specific rule and if all of the enabled conditions pass, the email has met the requirements for that rule.

When an email matches all the conditions on a rule, it will be added to a list of filtered messages. This list contains emails that have passed at least one rule; in this way we are filtering out email that does not match any rules. We continue to process the rest of the rules against the remaining emails.

The rule application process

Once this process has finished we have a list containing the emails that matched each rule – i.e. if an email has passed more than one rule, it will appear in the list multiple times. We transform the list so that emails that have passed more than one rule will have only one entry per email, but that entry contains a set of IDs of each rule that it was matched. We need this information to continue with the classifying process, and it also helps with diagnostics.

Finally we have everything exactly how we want it. We have a flattened list of emails and each email in this list has a record of which rule(s) it has passed. Now the Classifier converts these emails into content in Bipsync. The email subject will become the note title, the body will become the note body, all attachments will be processed and attached to the note, and the notes will have a record of which rules enabled their creation.

All that remains is to process these notes according to the rules that they have matched with.

Lights, Camera, Process Notes

The notes that have been created by the classifier are passed to the Action Processor, along with the rule data. For each note, the Action Processor applies the actions for each matched rule.

As with checking the conditions, the code for executing these actions is quite simple. For example, if you wanted all notes created by a specific rule to be assigned to the Bipsync user ‘Constance Markievicz’ then all the Action Processor has to do is change the ID of the note author to that of Constance Markievicz, and ensure that the note sharing/locking permissions are set to match her default settings.

These actions have also been implemented using an interface so that additional actions can be added in the future with ease, helping ensure the feature set of Mail Classifier is expandable.

If a rule has multiple actions, then these will all be executed. If a note has matched with multiple rules, then the actions of each rule will be executed. We continue until every piece of research that has been created by the Mail Classifier has been processed, and then the Classifier can take a deserved break. Until the next batch of emails are fetched…

Classifier Use-Cases

Everything detailed above comes together to give users a lot of power when it comes to deciding how Bipsync deals with incoming email.

If a client wishes to create an email whitelist, an array of addresses from which every email is automatically converted into a Bipsync note and tagged accordingly, then they need only create a rule, enable the ‘sent from address’ condition, and specify the email addresses they wish to white-list.

If a client receives a weekly report on the performance of a company, e.g. Apple, that they wish to automatically import into Bipsync and tag appropriately, then they can create a rule with the Attachment Name Contains condition and specify a matching string such as “Performance Report”. Finally, they would enable the Apply Tags action and choose the “Apple” and “Report” tags.

Finally, if a client is working with an external expert – a consultant, a sell-side analyst or otherwise – that does not have access to Bipsync then they can create a rule which takes every email sent from the expert’s email address and runs Bipsync’s Autotagger on it so that their research can be imported and tagged appropriately as soon as it is received; no users need lift a finger.

These are overly simple examples of course; these rules can be a lot more specific and also be combined, enabling clients to use this feature in a way that suits their unique investment or diligence processes. We’ll be diving more into client-specfic use-cases in future blog posts, so stay tuned.

Future Improvements

There are a few areas that we’ve identified for further development and a handful of additional features that we would like to add.

The fetching of emails via IMAP can be quite slow and this is especially true when it comes to emails with attachments. In the future we’d like to investigate potential alternative connection types. The APIs for Microsoft Exchange and Gmail offer faster interactions than those we can achieve with IMAP which could lead to improved performance.

This is not an immediate concern as we have load tested the Mail Classifier with both a large amount of rules and a large influx of emails, tests in which it performed extremely well.

We’d love to hear from clients who have additional requirements and ideas for this feature too.

If you’d like to start using the Mail Classifier at your fund, or would like to learn more, please get in touch.