2022-11-11 16:30:00 ~ 2022-11-12 16:30:00
Metadata is crucial for serving user requests. It also takes up a lot of space—and as we’ve grown, so has the amount of metadata we’ve had to store. This isn’t a bad problem to have, but we knew it was only a matter of time before our metadata stack would need an overhaul.
Dropbox operates two large-scale metadata storage systems powered by sharded MySQL. One is the Filesystem which contains metadata related to files and folders. The other is Edgestore, which powers all other internal and external Dropbox services. Both operate at a massive scale. They run on thousands of servers, store petabytes of data on SSDs, and serve tens of millions of queries per second with single-digit millisecond latency.
登录后可查看文章图片
数据库与普通文件系统的一个重要的区别就是;数据库可以在多种故障下仍然可以正确运行,保证系统以及数据的正确性;这些故障包括但不限于数据库系统本身故障,操作系统故障,以及存储介质故障等。那么数据库是如何在这些故障下面还能保证系统和数据的正确性的呢?主要靠Redo Log和Undo Log日志与WAL(Write Ahead Log)机制的配合来支持,其理论支持在论文《ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging》中介绍的非常详细。
登录后可查看文章图片
The key driver for this redesigned API is the fact that there are a range of differences across the 800+ device types that we support. Most APIs (including the REST API that Netflix has been using since 2008) treat these devices the same, in a generic way, to make the server-side implementations more efficient. And there is good reason for this approach. Providing an OSFA API allows the API team to maintain a solid contract with a wide range of API consumers because the API team is setting the rules for everyone to follow.
While effective, the problem with the OSFA approach is that its emphasis is to make it convenient for the API provider, not the API consumer. Accordingly, OSFA is ignoring the differences of these devices; the differences that allow us to more optimally take advantage of the rich features offered on each.
登录后可查看文章图片
继Gitlab的误删除数据事件没几天,“不沉航母” AWS S3 (Simple Storage Service)几天前也“沉”了4个小时,墙外的半个互联网也跟着挂了。如约,按 AWS 惯例,AWS今天给出了一个简单的故障报告《Summary of the Amazon S3 Service Disruption in the Northern Virginia (US-EAST-1) Region》。这个故障和简单来说和Gitlab一样,也是人员误操作。
Learn how we scaled contributions to Zalando Tech Radar.
How Zalando helps its engineering teams navigate the tech landscape.
We have revisited the process of technology selection at Zalando, adjusted the Tech Radar ring semantics, and moved towards principle-based decision making. In this post, we would like to share the process and its outcomes so far.
Caching is critical to how Rails applications work. At every layer, whether it be in page rendering, database querying, or external data retrieval, the cache is what ensures that no single bottleneck brings down an entire application.
But caching has a dirty secret, and that secret’s name is Marshal.
Marshal is Ruby’s ultimate sharp knife, able to transform almost any object into a binary blob and back. This makes it a natural match for the diverse needs of a cache, particularly the cache of a complex web framework like Rails. From actions, to pages, to partials, to queries—you name it, if Rails is touching it, Marshal is probably caching it.
Marshal’s magic, however, comes with risks.
登录后可查看文章图片
A closer look at how the Linux kernel influences Redis memory management.
技术债的产生是不可避免的,很多技术债短期不会表现出明显的问题,但在产品长期开发过程中,技术债的增加会使软件系统的可维护性直线下降,质量下滑,线上故障频发。产品研发团队要对技术债有正确的理解,技术的构建不是一劳永逸的,是需要不断维护的。
登录后可查看文章图片
Zalando developed a new type of SLOs to monitor the critical aspects of its business which is based on Operations. This blog post describes how that framework works, and how it contributes to healthier on-call rotations.
This article describes a systematic approach to reducing technical debt from the perspective of engineering management. It thoroughly describes the process that was set up in one of our core engineering teams and also addresses how such work can be effectively capitalized.
技术债务是由Ward Cunningham在1992年创造的一个比喻,被定义为当我们有意或无意地做了错误的或不理想的技术决策所累积的债务。
这里有两个关键词,“比喻”和“技术决策”,还有两个词不那么关键,但也不能忽视,“有意”和“无意”。我非常喜欢这个比喻,因为从技术管理角度,技术债务面对的也是各种成本和风险的问题,特别直截了当。既然是比喻,总有不是太一致的地方,这是我们需要注意的,不能真的就当成欠钱那么简单,这一点我们后面会提到。
之所以说这个比喻非常好、非常妙、非常呱呱叫呢,是因为技术已经成为许多企业的核心资产。尤其对于一些互联网公司、电信公司、金融企业、税务部门(老三样)来说,业务运行重度依赖IT系统。对于这样信息化程度很高或者说IT系统就是业务生产服务系统的组织来说,线上的系统、代码库里的源码、数据库里的数据甚至包括技术人员,这才是最重的资产。
前缀树,即字典树,又称 Trie 树。这种数据结构通常用来储存字符串,并且是以路径字符节点的形式来储存。拥有公共前缀的字符串,会共享同样的父节点路径。前缀树是通过利用字符串的公共前缀来降低查询时间的开销以达到提高效率的目的。
登录后可查看文章图片
监控一直是服务端掌握应用运行状态的重要手段,经过近几年的发展,阿里虾米服务端目前已经有 100 多个 Java 应用,承担核心业务的应用也有将近 50 个,对于应用的监控配置也是因人而异。有的人配置的监控比较细,有的应用在经历了多人开发阶段以后,监控就逐渐疏于管理,有些应用的监控项最后修改时间只停留到 2 年以前,早已不适应业务的发展。
与大部分团队一样,虾米也有一个报警处理群,将内部的监控报警平台(如 Sunfire 等)的信息通过机器人投递到群中,由于监控项配置不合理、监控粒度较大,每天报警群都被几十条甚至上百条报警通知狂轰乱炸,长此以往大家对报警已经麻木,大部分报警也不会去处理。
基于这样的现状,虾米 SRE 团队(SRE全称Site Reliability Engineering,最早由Google提出。致力于打造高可用、高拓展的站点稳定性工程)将工作重点放在了对监控的治理上面,经过 2 个月的研发,构建了虾米全新的监控体系。
登录后可查看文章图片
ARMS 是应用实时监控服务 (Application Real-Time Monitoring Service) 的简称,是阿里云上一款 APM 类的监控产品。
登录后可查看文章图片
在 Java 的世界里,似乎我们不用对垃圾回收那么的专注,很多初学者不懂 GC,也依然能写出一个能用甚至还不错的程序或系统。但其实这并不代表 Java 的 GC 就不重要。相反,它是那么的重要和复杂,以至于出了问题,那些初学者除了打开 GC 日志,看着一堆0101的天文,啥也做不了。
登录后可查看文章图片
一般可用性都是说后端服务的可用性,都说我们的服务可用性到了几个9,很少有人把可用性放到前端来。其实对于任何一个有UI交互流程的业务,都会有前端服务可用性,后端的可用性做的再高,前端一个按钮写的有问题点击不起作用也会导致用户无法完成流程。
随着移动互联网的发展,衍生了小程序、轻应用技术,它随时可用又无需安装卸载。uni-app框架使开发者编写一套代码,可发布应用到多个平台。AIPHD科技文教基于uni-app,研发了AIPHD英语、AI智能古诗等具有独立功能的矩阵产品。
登录后可查看文章图片
发现不了工具需求?没有需求制造需求?做出来的工具没人使用?来看看本文吧!!从细小处出发,着眼于如何从项目组的日常工作中提出新的工具需求点或者优化点,以及几种通用的解决方案思路。
想要赚钱,要先学会分钱。
大公司的charter有CDP项目,立项材料内容庞大,大公司立项的思考逻辑,就是分步骤讲清楚的这四件事,第一步是看市场演进和技术发展的趋势,第二步是分析选定的细分市场里客户需要解决什么的问题,第三步在理解清楚竞争格局后完成产品规格的定义,第四步要明确可执行的交付计划,立项材料的内容包含四个关键部分:市场趋势判断、客户需求分析、产品规格定义、开发执行策略。
登录后可查看文章图片
Trying to make a decision, especially in a dysfunctional group, can be extremely unpleasant - unending meetings, circular conversations, bad ideas, and passive aggressive or even openly aggressive conflict.
One of the reasons why decision-making can go so terribly wrong is that many of us were taught unhelpful myths about how to go about it, or were taught nothing at all. In my consulting work, I’ve witnessed one core practice that consistently improves group decision-making, especially when the group is experiencing conflict.
坐在飞机的靠窗位,你就拥有了最好的天空景观。偶尔你也会刷到朋友在上空拍的打卡照,湛蓝的天被框在圆形的舷窗里。
如果这时你突然注意到身边方方正正的窗户,你可能会产生一个问题。
飞机的窗户为什么是圆的?而不是方的呢?
登录后可查看文章图片
关注公众号
接收推送