2024-09-19 16:30:00 ~ 2024-09-20 16:30:00
In this post, we dive deep into how Netflix’s KV abstraction works, the architectural principles guiding its design, the challenges we faced in scaling diverse use cases, and the technical innovations that have allowed us to achieve the performance and reliability required by Netflix’s global operations.
From search to organization, rapid advancements in artificial intelligence (AI) have made it easier for Dropbox users to discover and interact with their files. However, these advancements can also introduce new security challenges. Large Language Models (LLMs), integral to some of our most recent intelligent features, are also susceptible to various threats—from data breaches and adversarial attacks to exploitation by malicious actors. While hundreds of millions of users already trust Dropbox to protect their content, ensuring the security and integrity of these models is essential for maintaining that trust.
At Pinterest, we operate a large-scale online machine learning inference system, where feature caching plays a critical role to achieve optimal efficiency. In this blog post, we will discuss our decision to adopt Cachelib project by Meta Open Source (“Cachelib”) and how we have built a high-throughput, flexible feature cache by leveraging and expanding upon the capabilities of Cachelib.
SQL is a vital tool used daily by engineers, operations managers, and data scientists at Uber to access and manipulate terabytes of data. Crafting these queries not only requires a solid understanding of SQL syntax, but also deep knowledge of how our internal data models represent business concepts. QueryGPT aims to bridge this gap, enabling users to generate SQL queries through natural language prompts, thereby significantly enhancing productivity.
QueryGPT uses large language models (LLM), vector databases, and similarity search to generate complex queries from English questions that are provided by the user as input.
This article chronicles our development journey over the past year and where we are today with this vision.
从23年开始,团队开始前端错误监控方向的开发。经历了一些列的迭代和发展,从监控SDK、上报、数据治理、看板集成、APM自研可视化初步完成了一条完整且适合B站前端监控。
? In this guide, you'll learn essential Git commands like git checkout and git restore to undo changes, git stash to save uncommitted work temporarily, git cherry-pick? to pull specific commits from one branch to another, and git reflog to recover "lost commits". Whether you're fixing a bug ?, changing priorities because your manager decided that, or restoring deleted code, these commands will help you handle common real-case scenarios with confidence ?.
我想对完整格式的 url 进行分词,请问有什么好的解决方案吗?
比如:https://www.abc.com/any/path?param_1=so+me¶m-2=other#title
看了官方的分词器,感觉没啥合适的?
预处理的话,又不知道该怎么处理更合适?
因为我们的数据量比较大,不太想用 pattern,感觉集群的压力会升高?
我看这个分词器会把识别的 url 当作一个token,但我想实现的是,对 url 尽可能保证精准的拆分。
滴滴国际化业务扩展面临多机房灵活部署挑战,早期部署低效且成本高。随云原生发展,微服务增多,需优化以减少业务RD参与,提升部署效率,核心在于识别部署低效的根本原因。
通过在 Web App 中使用 Speculation Rules API,我们可以大幅提升用户在网站中或跨网站进行导航的体验,从而对网站的转化率和留存率等商业指标做出正向的改变。
搜索是转转主要的流量分发入口,搜索场景覆盖了App首页搜索、App频道页搜索以及小程序搜索等各种服务入口。意图理解旨在准确地解读用户输入的搜索关键词背后的真正需求,对搜索体验的好坏至关重要。通过意图理解,搜索引擎可以调整搜索策略,提供与用户意图匹配的搜索结果,提高搜索结果的相关性和用户体验。此外,意图理解还可以帮助搜索引擎提供更多的个性化服务,如推荐相关内容、智能提示等,从而进一步提高搜索的效果和用户满意度。
意图理解简单来说就是从词法、句法、语义三个层面对 Query 进行结构化解析。在电商场景的首要问题是query的类目预测,例"iphone 15 pro 128 白色"的结构化类目为手机(类目)-苹果(品牌)-15 pro(型号)。转转的类目体系庞大、类目层级间存在关联,且query可能属于多个类目。转转的类目预测可理解为三个有关联性的任务。
本文主要介绍多任务学习在转转搜索意图理解的类目预测中的实践。首先介绍多任务学习的基本概念;其次介绍业界类目预测的方法;最后展示多任务学习在转转意图理解类目预测场景下的探索。
爱彼迎的整个用户旅程被划分为不同的页面,每个页面都对其自己的PPS值进行测量。为了支持这个基于页面的性能跟踪系统,我们构建了一个标准化的基础架构,使工程师能够配置代表其功能的页面。
在Android上,每个页面都与一个Fragment相关联。每个Fragment都必须提供一个LoggingConfig对象,指定一个页面名称,以便在需要引用页面名称时能够检索到。我们在Fragment的生命周期中收集性能数据,并在Fragment暂停时才发出日志事件。
我们用一个通用的PageName枚举类型标识每个页面,并在所有平台上引用,从而一致地表示我们用户操作中的每个页面。