在数据传输中应用微批处理模式
If you have worked on data-rich software systems, chances are you have worked with a distributed architecture where one part of your system needs access to data owned by another part of the system. Whether that architecture is a modern distributed microservices architecture or a set of stand-alone applications looking to exchange data, you will have one party owning that data (let’s call it the producer) and another party needing that data (let’s call it the consumer).
如果你曾在数据丰富的软件系统上工作过,那么你很有可能曾与一个分布式架构合作过,其中系统的一部分需要访问系统的另一部分所拥有的数据。无论该架构是现代分布式微服务架构,还是一组希望交换数据的独立应用程序,你都会有一方拥有这些数据(我们称之为 生产者)和需要该数据的另一方(让我们称其为 消费者).
How should your consumers obtain the data of your producers, while striking the right balance between latency and consistency?
你的消费者应该如何获得你的生产者的数据,同时在延迟和一致性之间取得适当的平衡?
One option, of course, would be to run real-time queries against the source system for maximum consistency. Blocking patterns like this mostly work well for small datasets but incur latency for larger result sets, especially when your data comes in the wrong representation or requires transformation.
当然,一种选择是针对源系统运行实时查询,以获得最大的一致性。像这样的阻塞模式对于小的数据集来说大多很有效,但对于较大的结果集来说就会产生延迟,特别是当你的数据以错误的表示方式出现或需要转换时。
Another option is to proactively stream data from your producers into a central hub (or data lake) for consumers to consume from. Here the challenges are in synchronization of read/write rates between producers and consumers, delivery guarantees by the data hub, and the sheer volume of data.
另一个选择是主动地将数据从生产者那里流向中央枢纽(或数据湖),供消费者消费。这里的挑战在于生产者和消费者之间读/写速率的同步,数据中心的交付保证,以及巨大的数据量。
While an event streaming system like Kafka is certainly the ideal option for real-time integration, it introduces complexities and comes at a cost that the business has to absorb. Not all use cases require the low latency event driven integration and don’t stand to gain anything from the additional investment. An eventually consistent replica is sufficient for many use cases in CRM or ecommerce, especially for back office facing domains like catalog management, inventory management...