话题虚拟机与容器 › Kubernetes

虚拟机与容器:Kubernetes

关联话题: k8s

Kubernetes 架构学习笔记(上)

本文总结了eBay云计算系统架构设计思路与经验。全文分为上下两篇,本篇为上半部分,围绕API展开;后续推文将发布下半部分,包含控制器逻辑、架构等内容。

知乎k8s在离线混部-离线篇

知乎为解决离线集群资源不足问题,采用Hadoop YARN服务在在线环境下部署,并使用YARN Federation架构来管理集群的搬迁和任务迁移。这样可以提高在线集群资源利用率,降低离线集群的超负荷运转。在离线混部过程中,需要解决技术选型、数据完整性、在线集群稳定性、任务平滑过渡和配置管理等关键问题。知乎在2022年选择了YARN Federation架构,以满足业务对接简单、集群变更对业务无感知、架构可复用等需求。详细架构图请参考官网链接:https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/Federation.html

知乎k8s在离线混部-在线篇

知乎通过应用混部技术实现了大规模部署,但在离线混部过程中面临问题。为了成本优化,知乎采取了系统化资源利用提升和静态资源潮汐调度等手段。系统化资源利用提升通过建立数据和应用指标,通知应用方降低配置,优化资源利用。静态资源潮汐调度则解决了K8S调度不均衡和资源碎片问题,实现了真实资源使用的调度。这些方法在不同阶段使用,帮助知乎优化资源利用和降低成本。

Rancher 和知乎超大规模多集群管理联合实践

知乎是中文互联网高质量的问答社区,每天有上千万用户在知乎分享知识、经验和见解,找到自己的答案。文章介绍了Rancher在管理大规模集群时遇到的性能问题,特别是超级管理员用户登录时的数据加载量较大,导致UI不可用且下游集群频繁断连。通过与Rancher团队的沟通,发现问题的根本原因是集群节点总量较大。

Failing to Auto Scale Elasticsearch in Kubernetes

A story of operational failure in large scale Elastisearch installation including the root cause analysis and mitigations that followed.

得物SRE K8s 故障诊断:从 CPU 高负载到挂载泄露根源揭示

容器SRE工程师不仅要保证系统的高可用性,还需要优化运行效率,确保系统在各种压力和突发情况下的韧性。本文我们将深入探讨容器SRE在日常工作中面临的挑战和如何通过专业技能和创新技术方案来定位和解决问题。

Kubernetes 的资源管理艺术

Kubernetes在云原生领域以卓越的容器编排能力成为标杆,通过广泛应用Linux内核的Cgroups功能,实现精细的系统资源控制,提升系统可用性和性能。本文将探讨其资源管理机制,并给出配置优化及最佳实践建议。

新版k8s移除docker后我们如何调试容器

Kubernetes在 v1.24 版移除了dockershim, 本文探讨移除dockershim后的相关事项以及如何调试容器

如何从k8s event发现集群内的问题

k8s 集群内的问题,怎么及时发现、采集和分析?来看看这篇文章吧

K8S故障排查之主机资源不足-磁盘篇

本文分享一个因为主机磁盘资源不足导致pod Evicted的故障

数据库不应放在容器中?- B站Kubernetes有状态服务实践(Elasticsearch/Clickhouse)

本文基于Elasticsearch/Clickhouse在B站生产环境的容器化/K8s编排能力落地, 将阐述为何我们需要进行容器化/on k8s, 容器化中遭遇的挑战以及解决方案, 落地的技术细节以及收益。

QA玩转K8S系列(二):从服务高可用分析滚动更新中可能踩的坑及6个case分析

Kubernetes(简称k8s)是一种流行的容器编排工具,其中Deployment控制器提供了滚动更新(Ro

Kubernetes Informer基本原理

如何高效可靠进行事件监听,k8s 客户端工具包 client-go 提供了一个通用的 informer 包,通过 informer,可以方便和高效的进行 controller 开发。

Kubernetes 安全风险加固手册

随着 Kubernetes 的广泛应用,安全风险也逐渐凸显出来。本文将从 Cloud、Cluster、Container 角度出发,以一种由下至上的方式,列举 Kubernetes 的安全风险,并提供相应的加固建议。

Kafka on Kubernetes: Reloaded for fault tolerance

Coban - Grab’s real-time data streaming platform - has been operating Kafka on Kubernetes with Strimzi in production for about two years. In a previous article (Zero trust with Kafka), we explained how we leveraged Strimzi to enhance the security of our data streaming offering.

In this article, we are going to describe how we improved the fault tolerance of our initial design, to the point where we no longer need to intervene if a Kafka broker is unexpectedly terminated.

pincompute: A Kubernetes Backed General Purpose Compute Platform for Pinterest

Modern compute platforms are foundational to accelerating innovation and running applications more efficiently. At Pinterest, we are evolving our compute platform to provide an application-centric and fully managed compute API for the 90th percentile of use cases. This will accelerate innovation through platform agility, scalability, and a reduced cost of keeping systems up to date, and will improve efficiency by running our users’ applications on Kubernetes-based compute. We refer to this next generation compute platform as PinCompute, and our multi-year vision is for PinCompute to run the most mission critical applications and services at Pinterest.

PinCompute aligns with the Platform as a Service (PaaS) cloud computing model, in that it abstracts away the undifferentiated heavy lifting of managing infrastructure and Kubernetes and enables users to focus on the unique aspects of their applications. PinCompute evolves Pinterest architecture with cloud-native principles, including containers, microservices, and service mesh, reduces the cost of keeping systems up to date by providing and managing immutable infrastructure, operating system upgrades, and graviton instances, and delivers costs savings by applying enhanced scheduling capabilities to large multi-tenant Kubernetes clusters, including oversubscription, bin packing, resource tiering, and trough usage.

In this article, we discuss the PinCompute primitives, architecture, control plane and data plane capabilities, and showcase the value that PinCompute has delivered for innovation and efficiency at Pinterest.

首页 - Wiki
Copyright © 2011-2025 iteam. Current version is 2.144.0. UTC+08:00, 2025-07-03 22:35
浙ICP备14020137号-1 $访客地图$