As engineers at Bipsync we’re often asked the question “What exactly is it that you do?”. More often than not this question comes from another engineer or software developer, in which case it actually means “What technology do you use and how does everything fit together?”. As Bipsync has grown, this question has become more and more difficult to answer succinctly so we thought we’d write about our architecture here, where anyone can read it.
Way back in November 2014 I wrote about how the Bipsync web app was structured. The underlying architecture hasn’t changed dramatically since, which is a sign of successful design. What has changed is the number of ancillary systems that complement the main RMS web app: back then we didn’t have a native iOS app, nor native desktop apps for Windows and OS X. We didn’t offer plugins for Microsoft Excel and other apps in the Office suite. We also didn’t have a web clipper, and our AutoTagging engine was yet to be developed.
In short: back then we had an application; now we have a platform.
Through a set of web apps and APIs we’ve incorporated these additional systems into our overall architecture in a consistent way that respects the specific needs of each element of the platform.
What do we mean by web app?
Before we go any further, we need to clarify what we consider a ‘web app’. We have a rich browser-based Research Management System (RMS) application that is accessible over the internet, anywhere in the world, but that’s just one example; it’s complemented by several other web applications such as an admin app and a reporting app that’s popular with compliance teams. We also have a number of distinct APIs that service other systems, such as the iOS app. I’ve illustrated this arrangement below:
From the diagram we can see that the RMS web app is but one of a few server-side applications that use PHP as their primary language. We’ve used elements of the Slim and Symfony frameworks in places, but the majority of our PHP code is domain-specific and is mostly concerned with manipulating data as it flows between a data store and a client. We use a modern version of PHP, which is 7.3 at the time of writing.
[A word on frameworks: As a rule of thumb we try to avoid ending up in a position where removing, replacing, or upgrading a framework or library will necessitate months of work. That way risk is minimised, we all have a good understanding of how the system works, and we only tend to implement features we absolutely need (since we don’t have time to write extraneous code, as is tempting when someone else has done much of the work for you). Generally we try to keep our dependencies to a minimum.]
An API for Consumers
Like most modern applications Bipsync offers a public API which third-parties can use to read and modify their data. The API covers the full range of Bipsync content types (research, contacts, events and so on) as well as specific areas of functionality, such as the ability to request and retrieve exports. It’s a standard HTTPS affair which uses JSON to send data back and forth, so people find it very easy to work with.
Through the API, we deliver many of our more popular services such as data archives, custom reports, and migrations from third-party solutions. Our clients also use the API to arrange their own integrations — for example, some use it to maintain their own universe of tags within Bipsync, e.g. from a security master record. And recently other companies in the FinTech space have used it to integrate their products with ours to offer a holistic solution to our clients.
We use Swagger to generate API documentation in OpenAPI format. If you’re interested in that, check out our documentation portal.
Data Stores and Microservices
Bipsync’s data is stored within MongoDB, and we also use Elasticsearch for anything that needs to be found by full text queries or more complex search strategies. No application queries our data stores directly. Instead they communicate with the “Data API” which in turn talks to the data layer. This is a microservice-style approach which allows us to localise our database queries and re-use them among all our applications.
We’ve written about this in more detail here, but essentially the main advantage of this approach is that our point of integration for data access and control isn’t the database, but an API – so we don’t need to fuss with stored procedures or foreign key constraints at the database level, and instead enforce them through code. We’ve found this to be a much simpler way to work.
While we tend to use PHP as it’s familiar to most of us, this design makes it trivial to write applications in any language that is able to communicate over HTTP – a Ruby app or a Python app or a Go app could message the Data API just the same. In fact, we’ve used Node.js for several of our new applications without missing a beat. These applications are less complex as a result of not needing to worry about database access. We’ve found that we’re able to lean on smart APIs that are designed specifically for, and can grow with, their clients.
What about asynchronous tasks?
For asynchronous or long-running tasks we employ Supervisor, a tool that runs as a daemon on a Linux server and manages other processes. Typically, these processes consume tasks. The tasks are command line operations that are executed on demand, which could mean anything from a simple unix program to a bash script or a complicated command-line application. A lot of the time they’re PHP scripts which interact with our Data API.
Some of these tools are substantial enough that they could be considered applications in their own right. There’s one to import emails from a given mailbox and convert them to notes in our system, one to import a user’s entire Evernote account in to Bipsync, and one to export a fund’s entire research portfolio to Bloomberg so it can be integrated into their Bloomberg Terminal.
Generally, anything that might take the system more than a second to process gets implemented as a task. This keeps our web processes nice and quick, our web servers idle, and our users happy – because the app is incredibly responsive as a result. Our web applications place tasks in the form of JSON documents into a queue (which is actually a collection in MongoDB), and they’re popped off and processed by the task daemons in a timely fashion.
Supervisor is responsible for managing many of our more complex integrations, such as our integration with Microsoft Exchange. We can tailor the workers to run at appropriate intervals, ensuring that data is synchronised as often as our users expect. It also runs also our reporting system which extracts data, formats it according to custom templates, and publishes files in a format like CSV or PDF either by email, SFTP upload, or any other method of the client’s choosing.
There’s nothing terribly exciting about our web server arrangement – we use NGINX with PHP-FPM because that combination leads to the fastest, most efficient way to run PHP apps. Our Node.js apps aren’t demanding in the same way, so we tend to use Express there.
For the remainder of this post let’s assume the phrase “web app” is interchangeable with “any one of the suite of Bipsync applications that are accessible over HTTPS” – the RMS app, the mobile app API, and so on. I’m about to introduce some more apps so hopefully that’ll keep things simple!
Beyond the web
Through our web apps our users can manage their research, contacts, tasks, and much more. However, due to their nature there’s a limit to what a web app can do – they are restricted from accessing the filesystem, for example.
Our users are often travelling, sometimes on planes with no connectivity, and therefore require a way to access their research that isn’t dependent on an internet connection. They also have a lot of content in other applications such as models in Microsoft Excel, and want that content to be managed by and accessible from Bipsync too. They also want to take advantage of all the features a modern smartphone, tablet or computer can offer.
For this reason we have several other native applications and plugins which bridge the gap between the web and the computer. Our first native iOS app, Bipsync Notes debuted in January 2015 and allows our users to access and author research on both iPhone and iPad. We’re developing additional iOS applications, to bring more of Bipsync’s functionality to mobile devices.
Bipsync Notes Desktop does the same as its iOS counterpart for both the Microsoft Windows and Apple OS X (Mac) platforms. These applications work on and offline, and take advantage of the operating system to offer a unique experience. They each have a unique set of challenges, and both are great fun to work on.
Bipsync Notes iOS
This iOS app is written mainly in Objective-C, though we recently switched to Swift for all future iOS app development. Crafting apps for iOS is a notoriously more intensive process than that of making apps for the web. Mobile devices have less resources to rely on, the severe nature of crashes mean errors have to be avoided like the plague, and there are some genuinely tricky concepts like threading to understand if the app is to be as responsive as our users expect.
We’ve found working on the app to be a challenging, but very rewarding, experience. It’s tailored toward doing one thing really well, which is simply to allow users to access their research all the time, wherever they are.
The app uses Core Data to model and store data and communicates with its web API through a set of sync operations that run in background threads to keep the app’s interface smooth. Our focus has been on ensuring that research content is always up to date and easily accessible; implementing background fetch and a proprietary full-text search solution were key milestones.
Other cool features in the app include a feature-rich PDF viewer and editor; support for biometric authentication; a Safari extension which allows web pages to be clipped to Bipsync; a OCR scanner to extract text from images; and even a couple of Easter eggs such as a Messages sticker pack, which we released as a Christmas gift a couple of years ago.
The iOS app is also fully compatible with enterprise MDM solutions such as InTune, Airwatch and MobileIron.
Bipsync Notes Desktop
Our desktop app brings offline capabilities to our users’ computers. As with the iOS app one of the main advantages of a native program is the ability to work offline with confidence and no loss in functionality. The app also boasts a fast full-text search.
Like the iOS app the desktop app has been designed to quickly synchronise with the RMS’ data store through its own custom web API. Where the mobile API has to tailor its responses to accommodate for cellular data speeds and restrictions, the desktop API can rely on faster, more stable connections so we’re able to get data up and down much more quickly. We encrypt all data, both in transit and at rest; the app uses a MongoDB compatible database to store data on the machine.
The desktop app uses the same rich-text editor as the web RMS app: CKEditor. It’s a great library that’s easy to customise through plugins, which we can re-use across both the web and native apps. Backbone.js is another library which is used on both web and desktop platforms – we love it for the way it empowers event-driven applications to easily respond to changing data without being too prescriptive in how those applications should be structured.
As with all our apps the desktop app is backed by a suite of automated tests. We use Selenium to ensure the app runs successfully – and almost identically – on both Windows and OS X platforms. It’s a genuinely impressive thing to see in action.
Bipsync is able to consume content from almost any source. In some cases this happens through a user interface, such as the RMS’ note editor, or the web clipper. In these circumstances the user is able to classify their content with any number of tags; these are often a company such as Apple, or the type of the note, like “Meeting” or “Call”.
What happens though, if the way the user submits the note to Bipsync doesn’t offer a way for them to add these tags? A classic example relates to notes that are emailed in to the system: your average email client doesn’t have a Bipsync tagging interface.
For these situations, we have the AutoTagger. This is a component that employs heuristics to arrive at the tags that given a set of rules, are most likely to be related to a given item of content. Some of these rules have been designed by us and are proprietary in nature; others can be defined by our users based on their knowledge.
Say a user emails a note into Bipsync with the subject “Conference call with Tim Cook RE: new iPads”. We’d expect the resulting note to be tagged with “Apple”, “Tim Cook”, and “Conference call” tags, with no further action by the user necessary. That’s a powerful feature, because it means users who are time constrained, or aren’t even aware they’re using an RMS, are able to submit research that is automatically contextualised for their coworkers. The content is much more discoverable as a result.
Technology wise, the AutoTagger is ostensibly a Node.js app, but this just provides the central service that is responsible for communicating tagging suggestions. As we build out the feature we expect to incorporate additional signals, such as models trained via machine learning techniques.
The AutoTagger relies on another of our proprietary services: the Entity service. Used by all the Bipsync applications, the Entity service maintains a universe made up of all the entities our users might want to associate with their research: companies (including their public stock tickers), industries, geographical places, etc. Already it stores hundreds of thousands of records; we synchronize the data to our Bipsync installations on a daily basis so it’s always current, and to make search and retrieval quick and painless.
Plugins, plugins, plugins
I mentioned earlier that our users often want to integrate Bipsync into existing workflows which involve third-party applications, such as those in the Microsoft Office suite (Excel, Outlook, etc.). Perhaps they’d like to be able to forward research-related emails in to Bipsync and have them stored as notes, or work on a model in Excel and have that file automatically upload itself to a related note in Bipsync.
To achieve this we have a few plugins, written in C# and built on Micosoft’s .NET framework, that appear as buttons in their respective apps. The behaviour varies depending on the app, but in Outlook for example (there’s a nice image of this below) tapping the button will send the email in to Bipsync, and even allows tags to be added from within Outlook itself.
Our Excel plugin works in a similar fashion, but here we go one step further and track the file that was sent to us. Email is immutable, but Excel files are not – so as further changes are made to the file and it is saved, we automatically sync the updated version of the file to Bipsync. We’re even able to extract values from cells in a spreadsheet and map them to various data properties within Bipsync.
The way these plugins are architected is pretty neat. They’re bundled with the desktop app installer, and then for all non-specific functionality – e.g. uploading a file to Bipsync – they pass the work off to the desktop app. This reduces the complexity of each plugin, instead locating that logic in a single place.
We also have a Chrome browser extension which clips web pages of interest in to Bipsync, an Excel extension which can import data from a spreadsheet, and numerous other little tools and helpers written in several different languages.
That’s it! For now…
It’s nice to stop and take stock of things every once in a while. I can scarcely believe how much our platform has grown, and looking at our roadmap I imagine the same will be true a year from now. Hopefully this has given you some insight into what we software developers do at Bipsync, and if you’re thinking “wow, that’s a lot of work for a small team”: well, we’re hiring!