知鸦日报2024-09-20

2024-09-19 16:30:00 ~ 2024-09-20 16:30:00

技术

信也技术：虚拟线程：性能飞跃的底层秘密！

摘要

永不停歇的CPU，催动代码飞跃运行，减少响应时长，提高应用吞吐量。

netflix技术：Introducing Netflix’s Key-Value Data Abstraction Layer

摘要

In this post, we dive deep into how Netflix’s KV abstraction works, the architectural principles guiding its design, the challenges we faced in scaling diverse use cases, and the technical innovations that have allowed us to achieve the performance and reliability required by Netflix’s global operations.

Introducing Netflix’s Key-Value Data Abstraction Layer

dropbox技术：How we use Lakera Guard to secure our LLMs

摘要

From search to organization, rapid advancements in artificial intelligence (AI) have made it easier for Dropbox users to discover and interact with their files. However, these advancements can also introduce new security challenges. Large Language Models (LLMs), integral to some of our most recent intelligent features, are also susceptible to various threats—from data breaches and adversarial attacks to exploitation by malicious actors. While hundreds of millions of users already trust Dropbox to protect their content, ensuring the security and integrity of these models is essential for maintaining that trust.

How we use Lakera Guard to secure our LLMs

pinterest技术：Feature Caching for Recommender Systems w/ Cachelib

摘要

At Pinterest, we operate a large-scale online machine learning inference system, where feature caching plays a critical role to achieve optimal efficiency. In this blog post, we will discuss our decision to adopt Cachelib project by Meta Open Source (“Cachelib”) and how we have built a high-throughput, flexible feature cache by leveraging and expanding upon the capabilities of Cachelib.

Feature Caching for Recommender Systems w/ Cachelib

uber技术：QueryGPT – Natural Language to SQL Using Generative AI

摘要

SQL is a vital tool used daily by engineers, operations managers, and data scientists at Uber to access and manipulate terabytes of data. Crafting these queries not only requires a solid understanding of SQL syntax, but also deep knowledge of how our internal data models represent business concepts. QueryGPT aims to bridge this gap, enabling users to generate SQL queries through natural language prompts, thereby significantly enhancing productivity.

QueryGPT uses large language models (LLM), vector databases, and similarity search to generate complex queries from English questions that are provided by the user as input.

This article chronicles our development journey over the past year and where we are today with this vision.

QueryGPT – Natural Language to SQL Using Generative AI

哔哩哔哩技术：B站前端错误监控实践

摘要

从23年开始，团队开始前端错误监控方向的开发。经历了一些列的迭代和发展，从监控SDK、上报、数据治理、看板集成、APM自研可视化初步完成了一条完整且适合B站前端监控。

基于大模型的自适应故障场景监控探索

摘要

本文聚焦于网络故障监控领域，基于实践分享LLMs在自适应故障场景监控的相关经验，以期为关注该议题的企业或学者提供更精准、高效的参考。

“1-3-5-10”原则：B站如何落地安全生产体系？

摘要

经历多起事故后，B站认为必须面向异常和故障设计一套安全生产机制。

58同城技术：揭秘海报生成技术

摘要

本文给大家介绍海报生成相关知识以及使用中常见的问题。希望能够抛砖引玉，为遇到类似需求或问题的伙伴们提供参考。

Ned's Declassified Git Survival Guide ?

摘要

? In this guide, you'll learn essential Git commands like git checkout and git restore to undo changes, git stash to save uncommitted work temporarily, git cherry-pick? to pull specific commits from one branch to another, and git reflog to recover "lost commits". Whether you're fixing a bug ?, changing priorities because your manager decided that, or restoring deleted code, these commands will help you handle common real-case scenarios with confidence ?.

Elasticsearch 完整格式的 URL 进行分词，有什么好的解决方案吗？

摘要

我想对完整格式的 url 进行分词，请问有什么好的解决方案吗？

比如：https://www.abc.com/any/path?param_1=so+me&param-2=other#title

看了官方的分词器，感觉没啥合适的?

预处理的话，又不知道该怎么处理更合适？

因为我们的数据量比较大，不太想用 pattern，感觉集群的压力会升高？

我看这个分词器会把识别的 url 当作一个token，但我想实现的是，对 url 尽可能保证精准的拆分。

Elasticsearch 完整格式的 URL 进行分词，有什么好的解决方案吗？

滴滴技术：深度拆解滴滴国际化建站提效利器：环境差异配置管理

摘要

滴滴国际化业务扩展面临多机房灵活部署挑战，早期部署低效且成本高。随云原生发展，微服务增多，需优化以减少业务RD参与，提升部署效率，核心在于识别部署低效的根本原因。

使用 Speculation Rules API 实现高效的跳转访问优化

摘要

通过在 Web App 中使用 Speculation Rules API，我们可以大幅提升用户在网站中或跨网站进行导航的体验，从而对网站的转化率和留存率等商业指标做出正向的改变。

小红书技术：小红书推出AIGC加速新算法TDD

摘要

由目标驱动蒸馏，文生图精准加速。

58同城技术：多任务学习在转转搜索意图理解的实践

摘要

搜索是转转主要的流量分发入口，搜索场景覆盖了App首页搜索、App频道页搜索以及小程序搜索等各种服务入口。意图理解旨在准确地解读用户输入的搜索关键词背后的真正需求，对搜索体验的好坏至关重要。通过意图理解，搜索引擎可以调整搜索策略，提供与用户意图匹配的搜索结果，提高搜索结果的相关性和用户体验。此外，意图理解还可以帮助搜索引擎提供更多的个性化服务，如推荐相关内容、智能提示等，从而进一步提高搜索的效果和用户满意度。

意图理解简单来说就是从词法、句法、语义三个层面对 Query 进行结构化解析。在电商场景的首要问题是query的类目预测，例"iphone 15 pro 128 白色"的结构化类目为手机(类目)-苹果(品牌)-15 pro(型号)。转转的类目体系庞大、类目层级间存在关联，且query可能属于多个类目。转转的类目预测可理解为三个有关联性的任务。

本文主要介绍多任务学习在转转搜索意图理解的类目预测中的实践。首先介绍多任务学习的基本概念；其次介绍业界类目预测的方法；最后展示多任务学习在转转意图理解类目预测场景下的探索。

airbnb技术：爱彼迎以用户体验驱动的 Android 性能度量

摘要

爱彼迎的整个用户旅程被划分为不同的页面，每个页面都对其自己的PPS值进行测量。为了支持这个基于页面的性能跟踪系统，我们构建了一个标准化的基础架构，使工程师能够配置代表其功能的页面。

在Android上，每个页面都与一个Fragment相关联。每个Fragment都必须提供一个LoggingConfig对象，指定一个页面名称，以便在需要引用页面名称时能够检索到。我们在Fragment的生命周期中收集性能数据，并在Fragment暂停时才发出日志事件。

我们用一个通用的PageName枚举类型标识每个页面，并在所有平台上引用，从而一致地表示我们用户操作中的每个页面。

科普

牛到底在圈内还是圈外？小学生都懂的常识，为什么非要花6500行数学分析来证明？

摘要

在数学上，一般几何法做不到的事，可以考虑用代数法。同理，代数分析不了的，又可以用几何法。

‹ 2024-09-19 日报 2024-09-21 日报 ›