知鸦日报2024-08-29

2024-08-28 16:30:00 ~ 2024-08-29 16:30:00

技术

大陆地区18位身份证校验码算法

摘要

大陆地区18位身份证校验码算法，求解“模11的17次同余方程”。

B站故障应急与业务1-5-10摸排：如何实现超95%故障自发现率？

摘要

2023下半年B站已实现了推搜业务故障自发现率95%+，社区相关故障自发现率80%+。

登录后可查看文章图片

grab技术：Chimera Sandbox: A scalable experimentation and development platform for Notebook services

摘要

Key to innovation and improvement in machine learning (ML) models is the ability for rapid iteration. Our team, Chimera, part of the Artificial Intelligence (AI) Platform team, provides the essential compute infrastructure, ML pipeline components, and backend services. This support enables our ML engineers, data scientists, and data analysts to efficiently experiment and develop ML solutions at scale.

With a commitment to leveraging the latest Generative AI (GenAI) technologies, Grab is enhancing productivity tools for all Grabbers. Our Chimera Sandbox, a scalable Notebook platform, facilitates swift experimentation and development of ML solutions, offering deep integration with our AI Gateway. This enables easy access to various Large Language Models (LLMs) (both proprietary and open source), ensuring scalability, compliance, and access control are managed seamlessly.

登录后可查看文章图片

doordash技术：How DoorDash is pushing experimentation boundaries with interleaving designs

摘要

DoorDash leverages interleaving designs to boost experimentation sensitivity, enabling faster, more precise insights compared to traditional A/B testing.

登录后可查看文章图片

airbnb技术：Building Postcards for “Airbnb” Scale

摘要

How the Airbnb Media team built group travel Postcards for the 2024 Summer Release by leveraging a novel destination matching algorithm while advancing the platform’s image & localized text processing capabilities.

登录后可查看文章图片

meta技术：How Meta enforces purpose limitation via Privacy Aware Infrastructure at scale

摘要

Purpose limitation, a core data protection principle, is about ensuring data is only processed for explicitly stated purposes. A crucial aspect of purpose limitation is managing data as it flows across systems and services. Commonly, purpose limitation can rely on “point checking” controls at the point of data processing. This approach involves using simple if statements in code (“code assets”) or access control mechanisms for datasets (“data assets”) in data systems. However, this approach can be fragile as it requires frequent and exhaustive code audits to ensure the continuous validity of these controls, especially as the codebase evolves. Additionally, access control mechanisms manage permissions for different datasets to reflect various purposes using mechanisms like access control lists (ACLs), which requires the physical separation of data into distinct assets to ensure each maintains a single purpose. When Meta started to address more and larger-scope purpose limitation requirements that crossed dozens of our systems, these point checking controls did not scale.

netflix技术：Recommending for Long-Term Member Satisfaction at Netflix

摘要

Our mission at Netflix is to entertain the world. Our personalization algorithms play a crucial role in delivering on this mission for all members by recommending the right shows, movies, and games at the right time. This goal extends beyond immediate engagement; we aim to create an experience that brings lasting enjoyment to our members. Traditional recommender systems often optimize for short-term metrics like clicks or engagement, which may not fully capture long-term satisfaction. We strive to recommend content that not only engages members in the moment but also enhances their long-term satisfaction, which increases the value they get from Netflix, and thus they’ll be more likely to continue to be a member.

登录后可查看文章图片

探究：Elasticsearch 文档的 _id 是 Lucene 的 docid 吗？

摘要

之前在与研发进行 ES 使用优化的过程中，研发的同事饶有兴致的在会议后问了我这么一个问题：我们写入 ES 的 _id 字段和 lucene 中使用的 docid 是一个内容么？

腾讯技术：为超越JVM而生？深入理解Kotlin Native的梦想与可能

摘要

Kotlin Native 是 Kotlin 多平台生态的关键一环，也是 Kotlin 开发者突破自身发展瓶颈的重要方向。本文依据 Kotlin Native 的源码，结合作者在运用 Kotlin Native 开发多平台应用的实战经验，详细为大家解读 Kotlin Native 在编译时和运行时的实现细节和实践技巧。

登录后可查看文章图片

阿里巴巴技术：Java字符串拼接技术演进及阿里巴巴的贡献

摘要

本文主要讲述了Java字符串拼接技术的演进历程，以及阿里巴巴贡献的最新实现 PR 20273。

登录后可查看文章图片

vivo技术：TimeWheel 算法介绍及在应用上的探索

摘要

介绍时间轮算法的算法思想及其数据结构，详细阐述了三种时间轮模型的数据结构和优劣性，介绍时间轮算法在 Dubbo 框架中的应用，并给出了它在 Dubbo 中的主要实现方式与实践。

登录后可查看文章图片

得物技术：风控基建实战：如何打造高效安全的AIGC系统

摘要

如何能够让用户能够轻松地接触和利用ChatGPT，从而提升日常工作效率，并发掘AIGC在各种业务场景中的潜力？

登录后可查看文章图片

58同城技术：转转质检数字化埋点探索之路

摘要

转转每一台‘官方验’的设备，都会经过质检站点对其进行全方面的检测。在检测流水线中，质检工程师对每台设备进行检测作业；这过程中除了产生对应的检测结果外，检测过程的动作、硬件参数、耗时等数据目前没有系统性的利用起来；而这些操作过程中的数据可以较好的体现质检工程师的整个质检过程。基于此，我们希望能通过技术手段获取到这个过程的数据。

登录后可查看文章图片

携程技术：PgVector在Qunar&途家的运维实践

摘要

随着AI相关技术的发展尤其是大语言模型(LLM)的广泛应用，海量的非结构化数据随之而来，如何存储以及高效检索这些数据成为热点问题，在此背景下AI时代的DB基座——向量数据库便应运而生了！

向量数据库支持存储AI算法经过Embedding后产生的向量类型数据，通过索引技术和向量相似度距离查询方法来支持向量数据的高效检索，解决了AI领域对于向量数据存储和高效检索的问题。

登录后可查看文章图片

阿里巴巴技术：Java虚拟线程探究与性能解析

摘要

虚拟线程是由Java运行时而不是操作系统实现的Java线程。大量的虚拟线程赋予了它们强大的功能，从而实现更高的吞吐量和更少的硬件浪费。最近做个人项目的时候便尝试使用JDK21进行开发，研究一下所谓的虚拟线程的原理与实现。

登录后可查看文章图片

‹ 2024-08-28 日报 2024-08-30 日报 ›