Kafka in the Cloud: Why it’s 10x better with Confluent | Find out more
The Markets in Financial Instruments Directive II (MiFID II) came into effect in January 2018, aiming to improve the competitiveness and transparency of European financial markets. As part of this, financial institutions are obligated to report details of trades and transactions (both equity and non-equity) to regulators within certain time limits.
Regulatory Trade Standard (RTS) 1 stipulates that equity trades must be reported to an Approved Publication Arrangement (APA) within one minute of being executed, while RTS 2 stipulates that non-equity trades must now be reported to an AMA within five minutes.
This presents a number of challenges for the data engineering teams of financial institutions. In this blog, we’ll delve into these challenges before explaining how they can be addressed with data streaming.
MiFID II broadens the range of financial instruments which require post-trade reporting to APAs and mandates the submission of additional data; it requires 65 fields of transaction data to be submitted in comparison to MiFID I’s 25. This, combined with the recognition that non-equity transaction volumes are expected to increase over time (i.e., owing to new financial instruments), means more bytes have to be processed and transferred from the trading systems of financial institutions to APAs. As data volumes increase, PTT pipelines need to be able to scale elastically in order to meet demand. Systems based on legacy messaging queues may not be able to easily handle this demand—they often fail to scale efficiently as more data producers are added, leading to throttling and possible outages.
Post-trade reporting under MiFID II relies on information from a wide range of sources and systems. This means that “post-trade reporting data” is heterogeneous, containing different information and structured in varying formats. Before being submitted to an APA, however, this data needs to be transformed into a standardized format (e.g., MMT under FIX) and validated for data quality. In order to submit information to an APA within the required time frame, a PTT pipeline must integrate, transform, and govern post-trade data in near real time, while scaling up and down horizontally to meet variable throughputs.
Financial organizations are increasingly turning to data streaming technologies in order to meet their PTT reporting requirements.
Data streaming refers to the continuous and real-time transfer of data from one system to another. In contrast to batch processing, where significant amounts of data are processed in groups at periodic intervals, data streaming processes data as soon as it is produced or received, ensuring it is ready for immediate analysis or application.
Apache Kafka® is the default data streaming technology, used by over 70% of Fortune 500 companies. It’s valued for its ability to reliably stream high volumes of data at low latencies, decoupling data producers from consumers and providing organizations with robust fault tolerance mechanisms. It’s commonly used alongside another open source technology, Apache Flink®, which provides an advanced compute layer for streams of data.
Confluent brings these technologies together as part of its complete, enterprise-ready data streaming platform, available on-premises or in the cloud. Many large financial service organizations use Confluent as the backbone of their event-driven architectures, powering everything from real-time payments to trading platforms.
Prior to adopting data streaming with Confluent, a large multinational bank, headquartered in Europe, was reliant on batch-based processing for its post-trade reporting needs. They were operating around 10 separate trading infrastructures, each of which produced a different set of trade output data in varying formats. While the bank’s data team managed to integrate and process this data ready for reporting within 15 minutes of a trade taking place, they weren’t able to meet MiFID II’s five-minute threshold with their existing technology.
The bank first evaluated a traditional message queue as a means to integrate their disparate trading systems, however, discounted it in favor of Apache Kafka which they had just begun to use in a limited way. Kafka’s ability to persist data as an immutable log of events, as well as its ability to decouple data producers and consumers (thereby enabling microservices to scale independently) was fundamental in their decision to deploy an event-driven architecture.
Given the criticality of PTT pipelines to the business, the bank wanted to ensure they had the support of an enterprise-ready data streaming platform. They chose Confluent in order to de-risk their investment and future-proof their data infrastructure for possible changes to post-trade reporting requirements. Here’s a high-level overview of their solution:
In this solution, post-trade data from multiple trading applications (each relating to a different asset class) is ingested into Confluent Platform via a Debezium SQL Server source connector and an IBM MQ source connector. Raw streams of trade data are synced to Kafka and processed (i.e., enriched and formatted to comply with reporting standards) in real time via Flink (hosted on Kubernetes). Additional standardised data sources such as Legal Entity Information (LEI) and International Security Identification Numbers (ISIN) databases are synchronised into Kafka and used to enrich the trades post submission. Data quality and business rules are also applied to filter out trades that do not meet the regulatory requirement for reporting. For example certain low value trades might not need to be submitted to the regulator. Data structure and quality is maintained by Schema Registry, which validates data contracts between data producers and consumers.
Regulatory reporting applications, external to Confluent, subscribe to an enriched “trades to report” topic. All post-trade events are shared with the relevant regulatory reporting body within seconds of the trade occuring, with all required information in a compliant format.
Confluent’s data streaming platform has enabled this organisation to comply with MiFID II. By switching from batch-based messaging queues to an event-driven architecture, they’re able to stream, process, and govern post-trade data from across divergent source systems, and deliver it to regulatory bodies within the stipulated time frame.
At a wider level, Confluent has also enabled this financial institution to think more broadly about the applications of an event-driven architecture; alongside meeting evolving post-trade regulations, this organisation is increasingly thinking of its data in terms of “data products”—that is, governed, discoverable streams of real-time data reusable across the business.
This blog explores how cloud service providers (CSPs) and managed service providers (MSPs) increasingly recognize the advantages of leveraging Confluent to deliver fully managed Kafka services to their clients. Confluent enables these service providers to deliver higher value offerings to wider...
With Confluent sitting at the core of their data infrastructure, Atomic Tessellator provides a powerful platform for molecular research backed by computational methods, focusing on catalyst discovery. Read on to learn how data streaming plays a central role in their technology.