中间件与数据库：Flink的相关资料

Detecting Image Similarity in (Near) Real-time Using Apache Flink

Pinterest is a visual platform at its core, so the need to understand and act on images is paramount. A couple of years ago, the Content Quality team designed and implemented our own batch pipeline to detect similar images. The similarity signal is widely used at Pinterest for use cases varying from improving recommendations based on similar images to taking down spam and abusive content. However, it was taking several hours for the signal to be computed for newly created images, which was a long window for spammers and abusers to harm the platform. So recently, the team implemented a streaming pipeline to detect similar images in near-real-time.

pinterest技术

Pinterest Flink Deployment Framework

Apache Flink是一个框架和分布式处理引擎，用于在无界和有界数据流上进行有状态计算。它提供的功能包括精确的唯一性保证、低延迟、高吞吐量和强大的计算模型。在Pinterest，我们采用Flink作为统一的流处理引擎。

pinterest技术

有赞 Flink 实时任务资源优化探索与实践

随着 Flink k8s 化以及实时集群迁移完成，有赞越来越多的 Flink 实时任务运行在 K8s 集群上，Flink k8s 化提升了实时集群在大促时弹性扩缩容能力，更好的降低大促期间机器扩缩容的成本。同时，由于 K8s 在公司内部有专门的团队进行维护，Flink k8s 化也能够更好的减低公司的运维成本。

不过当前 Flink k8s 任务资源是用户在实时平台端进行配置，用户本身对于实时任务具体配置多少资源经验较少，所以存在用户资源配置较多，但实际使用不到的情形。比如一个 Flink 任务实际上 4 个并发能够满足业务处理需求，结果用户配置了 16 个并发。这种情况会导致实时计算资源的浪费，从而对于实时集群资源水位以及底层机器成本，都有一定影响。基于这样的背景，本文从 Flink 任务内存以及消息能力处理方面，对 Flink 任务资源优化进行探索与实践。

有赞技术

中间件与数据库：Flink的相关资料

中间件与数据库：Flink

Detecting Image Similarity in (Near) Real-time Using Apache Flink

Pinterest Flink Deployment Framework

有赞 Flink 实时任务资源优化探索与实践

基于Flink构建实时数仓实践

【Flink】基于 Flink 实时计算商品订单流失量

flink动态分流

字节跳动基于Flink的MQ-Hive实时数据集成

阿里巴巴大规模应用 Flink 的实战经验：常见问题诊断思路

达达集团实时计算任务SQL化实践

基于 Apache Flink 的实时 Error 日志告警

基于Kafka+Flink+Redis的电商大屏实时计算案例

Flink SQL 如何实现数据流的 Join

日均百亿级日志处理：微博基于Flink的实时计算平台建设

Flink：你绕不过去的 Hello World

Flink在美团的实践与应用--大数据技术栈15

日均处理万亿数据！Flink在快手的应用实践与技术演进之路