超扩展设计用于处理大型扩展应用程序的新数据库
如果无法正常显示,请先停止浏览器的去广告插件。
1. The state of databases
in the current landscape
July, 2023
Michael Widenius
Creator of MySQL and MariaDB
CTO @ MariaDB
2. Overview
● The History of RDBMS and MariaDB
● Big Data is driven by Open Source
● AI and databases
● Databases and the cloud
● What I have been working on lately
3. The history of RDBMS
4. We
Start
Here
Here
Now
5. What made MySQL successful?
● When MySQL was started (1994) there were big players such as Microsoft, IBM,
Oracle, all having a significant portion of the market.
○ Internet was new and everyone needed a web-optimized DB
● However, the big players did not see the Internet as a viable business platform!
● MySQL was already proven stable before release (used for data warehousing and
web)
● We created a “Virtual company”, which made it easy to find good people
● New “free” license scheme (this was before Open Source)
○ Free for most, a few have to pay
○ Second program (ghostscript was first) to use dual licensing.
■ MySQL was first to do it with GPL.
● Very easy to install and use (15 minute rule)
● Released source and tested binaries for most platforms
● MySQL was a needed, stable and easy to use product with the right price
6. What made MySQL successful?
● MySQL started from the bottom-up, providing a cheap (or free) solution for
the web industry.
○ We were friendly and helpful towards community
● I personally wrote 30,000+ emails the first 5 years to help people using
MySQL
○ MySQL then started growing in other industries (enterprise sector).
○ MySQL followed an open source development model.
● "1 customer in 1000 actually pays" - Still enough to grow very fast.
● The community provided testing, marketing and simple support needs.
● MySQL Ab provided full support for those that wanted/needed it.
● We (the MySQL founders) waited with investments until product was “good
enough”
7. Why MariaDB was created
“Save the People, Save the Product”
●To keep the MySQL talent together
●To ensure that a free version of MySQL always exists
●To get one community developed and maintained branch
●Work with other MySQL forks/branches to share knowhow and code
●After Oracle announced it wanting to buy Sun & MySQL this got to be even
more important.
8. How was MariaDB disruptive?
MariaDB follows in the same original footsteps of MySQL (which
Oracle has not done):
● It is a true to Open Source project, following proper Free
Software / Open Source practices.
● Development happens in the open, working together with the
community.
● MariaDB Foundation was created to ensure that MariaDB
would always be Open Source.
● This process made MariaDB stand out.
○ MariaDB has been integrated into most major Linux and
other free OS distributions as the default “MySQL” variant.
9. MariaDB Ecosystem
MariaDB Foundation
Works with the community
MariaDB Corporation
Works with customers
Builds and tests binaries
- Develops MariaDB buildbot Provides paid support and subscriptions for
MariaDB (Enterprise and Community)
Drives adoption
- Works with OS to ensure MariaDB
is included everywhere Employs most of the MariaDB developers
- Main driver of MariaDB development
Works with community developers
- Reviews architecture and patches
- Approves and pushes changes MariaDB Enterprise
- Longer End-of-life
- Stable features are backported to earlier
versions to minimize needs for upgrades.
Insures that MariaDB is always free.
Founded trough sponsorship's. Provides NRE (paid development of new
MariaDB features).
10. MariaDB future plans
Monty’s view
Work with customers to help them migrate from commercial
closed source database to MariaDB
The MariaDB Oracle compatibility layer makes this easy
Improve optimizer to handle very complex queries
This is important when moving complex application to MariaDB
MariaDB 11.0 has a new cost model that greatly improves
handling of complex queries.
Work closer with SAS database providers to make MariaDB
work better in their environment
MariaDB multi-tenancy feature is part of this collaboration.
11. Big data is driven by
Open Source
12. Big data driven by Open Source
● MySQL / MariaDB
○ SQL database with flexible replication, ColumnStore and Spider for scale out
● Apache Cassandra
○ Tunable consistency
○ Keys map to multiple values, which are grouped into column families
○ CQL language
● MongoDB
○ Stores structured data as JSON like documents
● CouchDB
○ MVCC, ACID, eventually consistent
● Hbase (Hadoop + distributed file system)
○ Written in Java. Compression, in-memory operation and Bloom filters (Is data part of
a set).
● HAWQ, Pivotal HDB
○ Hadoop + SQL
● Redis
○ In memory database (+ snapshots to disk)
○ Optional durability
● Clickhouse
13. Why use NoSQL
●
●
●
●
●
Faster replication
Fast and easy key/value access
Data is often stored in memory
Note that with similar memory resources you can usually keep SQL data in memory too.
Can handle unstructured data
● In traditional SQL, one can’t easily implement a web store
● Most NoSQL vendors are considering adding SQL support.
Like Atlas SQL for MongoDB (not open source)
14. Why NOT to use NoSQL
● Lot of data duplication
Relational database are designed to keep only one copy of a relation (like a customers
data)
● Can't (easily) join/combine data
● No standardized language; Hard to move data and applications between systems
● Complete lock in to one system
● Normally few connections to computer languages or existing applications that need a
database.
● Allows one to initially ignore solving the database layout problem as one can store data
'as such'
● Initially easier to use, MUCH harder later
● No sustainable scalable business models for most open source NoSQL solutions (as most
of them rely on the BSD or Apache licenses)
15. Problems to overcome with BIG data
● Storage (not normally the big problem anymore)
○ Memory (Very expensive)
○ SSD
(Expensive but slower)
○ Hard disk (Inexpensive. Slow when doing random reads, fast on sequential reads.
The "tapes of tomorrow")
● Access patterns
○ Some access patterns are good for big data (scanning in parallel) while others are
hard
■ Even simple joins can take days on petabyte tables
● Get it up when things fail
○ Recovery; Can take days or weeks
○ Get the caches warm (you need > 90 % hit rate with buffers!)
● Replication and hot standby
○ Must have, but makes big data more expensive
16. Definition of big data is changing
● When I worked with data warehousing (1986-1993) , big data was (all credit card
transactions for one Sweden’s oil companies):
● 1M users, 4M transactions / month (30 byte / transaction)
10 years of data = 1.3G
● Handled by a Sun SparcStation, 25Mz, 32M memory. 2G hard disk
● CPU's are now 700 times faster and have 10,000 times more memory.
SSD/Hard disk seek speed is 6000 times faster than in 1986
● While machines have got faster, most data is still small (except for social and behavior
data) and can easily be handled by one machine.
17. Why big data solutions are hype
● Most companies will not have as much real data as Facebook, Twitter, Bilibili, TikTok,
etc. that needs to be accessed "at once"
● Solutions that they have to use are not applicable for others.
○ In the near future one can run what most companies think is big data on a few
machines:
○ Memory now costs about 2100$/T; Most can afford 200G of RAM.
○ SSD are now 50$/T (Read: 560-3100MB/sec 98K IOPS, Write: 520-2250MB/sec
write.
○ You can get a 100Tb SSD from Nimbus and a 960Tb from Dell. 16T can easily get
bought.
● MariaDB today can easily handle 0.5T of active data. It is only after 1T when one needs
to consider analytical databases.
● There are very few users that need more than 0.5T of data.
● The most common database size for big enterprises are 100G for their largest production
databases.
● 90% of queries processes less than 100 MB of data.
○ See https://motherduck.com/blog/big-data-is-dead/ for details
18. AI and Databases
19. AI and databases
● Personally I don't see AI helping optimizing queries inside the database (data is changing
constantly and the optimizer is already using statistics and histograms to make
decisions).
● Optimizer also has to be very fast and handle a lot of concurrency with a limited amount
of memory.
● AI can be used to optimize SQL queries for applications.
● AI can be used to translate natural language to SQL, which is useful for interactive
queries but too slow to be used in production for applications.
● When it comes to programming, I would only trust AI to:
○ Initially set things up for a common problem (write a script that does 'easy
explanation')
○ Try to find 'simple mistakes/bugs' in code like possible buffer overflows, missing
arguments, things that can easily simplified etc.
○ Translating things from one language to another.
○ Do a simple change that can be repeated in many parts of the code.
■ Add a parameter of this type to this function and ensure that all callers are fixed.
■ The resulting code has to be carefully reviewed!
20. AI and databases
● I would never even try to use AI for doing changes in a complex project, like adding
catalogs to MariaDB.
○ There are no similar patterns in existing software that it could be trained on.
○ Too many things in MariaDB are interconnecting in not obvious ways (if one does not
know the code)
● If the AI would produce 'hard to understand code', I could not use AI to explain why it did
things that way.
○ All MariaDB developers I am working with are experts and can be put on solving very
complex problem in their domain. I don't see anyone being replaced by AI during their
or their children's lifetimes.
21. Databases and the Cloud
22. Database scaling for SaaS providers
● SaaS providers want to optimize the number of customers they can host on
their hardware.
● When offering managed services on MySQL/MariaDB, SaaS providers have
the following options:
○ Create separate VMs for each customer.
■ Large overhead - costly
■ A typical “idle” MariaDB Server requires ~1GB of memory
■ Best user experience
○ Use shared instance
■ Force user restrictions, only grant limited database access.
● cPanel has this model, many other database management systems
share it
■ Very limited overhead
■ Poor user experience, artificial limitations enforced on users
■ Affected by “noisy neighbour”, hard to track
23.
24. New catalogs feature for MariaDB
● Catalogs provide the best of both worlds.
○ Shared instance
○ A catalog looks like a normal MariaDB Server
○ No user limitations
■ Each user has full control over their catalog
■ Root access on the catalog
● Catalogs still have the problem of noisy neighbour
○ However, statistics are now collected for each catalog
○ So it’s trivial to detect the problematic catalog
○ Move the problematic user to a bigger machine, increase costs
● Catalogs can also enforce quotas
○ Each catalog have their own configuration file with their own limits
○ Force users to stick to certain performance limitations.
● Sysadmins have control over all catalogs.
25. Catalogs feature
● Catalogs enable hyper-scaling for SaaS providers.
○ For basic users, one can now host up-to 100x more users on the
same machines.
● In MariaDB, this feature is still under development
○ If you would like to steer the roadmap for this, now is the time!
■ Tooling
■ Performance optimizations
■ Specific functionality
26. Career path for programmers
27. What I have been working on lately
I am still actively coding!
● I mproving
the MariaDB optimizer to handle complex queries for big
databases
○ This is based on input from MariaDB customers and users
○ I spent almost one year on redoing and improving the cost model of
the MariaDB optimizer for MariaDB 11.0.
◦ This was what I was working on during my 10 days of Covid
quarantine during my last trip to China!
○ Initial test shows that most of the recent queries that has caused
problems with MariaDB 10.6 optimizer works very good in MariaDB
11.0!
●The last 6 months I have been working on the multi-tenancy catalogs
for MariaDB.
○This was based on input from several SAS provider at the last 2
CloudFest conferences.
28. Career path for programmers
● I created MySQL and MariaDB. After the initial setups (where I handled everything), I have always
hired people to do management, customers and leading the company so that I can continue to
focus on architecture, development, and leading the MySQL/MariaDB developer teams.
● A very common question I get in China is what is the best career path for a developer? Should I
become a manager or start doing something else?
● Good coders are hard to find!
○ It takes a long time for a coder to reach their peak (8-10) years of experience.
○ Good programmers are valuable and can produce quality code for LONG time (>> 70 years)
● Don’t waste a good coder by “promoting” him to a manager.
○ A good coder is not necessarily a good manager.
○ Promote them by giving them more responsibility:
■ Architecture - design bigger and more complex systems
■ Help others to work on their code (but not manage!)
● My advice to managers:
○ Do not follow the "Peter Principle" to promote people until they reach their level of
incompetence!
○ The expected salary raise/year in China is not suitable for long term programmers (35 year
29. Thank you