Materialized view makes login process 25k times faster
It may sound like a dramatic number, and that’s because it is. One thing that is enjoyable about working on beta applications is finding new solutions and better methods to improve the user experience. The original method for displaying the recent addition of overview analytics data in the beta version of Kontrollbase was to run the following query after the login process completed:
SELECT (max(`server_statistics`.`os_mem_used`) / pow(1024,3)) AS `max_os_mem_used`,(min(`server_statistics`.`os_mem_used`) / pow(1024,3)) AS `min_os_mem_used`,(avg(`server_statistics`.`os_mem_used`) / pow(1024,3)) AS `avg_os_mem_used`,(std(`server_statistics`.`os_mem_used`) / pow(1024,3)) AS `stdev_os_mem_used`,(max((`server_statistics`.`length_data` + `server_statistics`.`length_index`)) / pow(1024,3)) AS `max_size`,(min((`server_statistics`.`length_data` + `server_statistics`.`length_index`)) / pow(1024,3)) AS `min_size`,(avg((`server_statistics`.`length_data` + `server_statistics`.`length_index`)) / pow(1024,3)) AS `avg_size`,(std((`server_statistics`.`length_data` + `server_statistics`.`length_index`)) / pow(1024,3)) AS `stdev_size`,max(`server_statistics`.`num_connections`) AS `max_connections`,min(`server_statistics`.`num_connections`) AS `min_connections`,avg(`server_statistics`.`num_connections`) AS `avg_connections`,std(`server_statistics`.`num_connections`) AS `stdev_connections`,max(`server_statistics`.`queries_per_second`) AS `max_qps`,min(`server_statistics`.`queries_per_second`) AS `min_qps`,avg(`server_statistics`.`queries_per_second`) AS `avg_qps`,std(`server_statistics`.`queries_per_second`) AS `stdev_qps` from `server_statistics` where (curdate() - interval 7 day);
This produced the data results necessary but at the expense of long wait times for the login process if the Kontrollbase server in question was analyzing a large number of servers. In a standard method to improve this query time I referenced the index on the Creation_time column and the end of the query changed from “where (curdate() – interval 7 day);” to “where Creation_time < (curdate() – interval 7 day);“. Running an explain on the two queries shows the index being utilized. The first explain is without using an index, the second with an index. You can see the rows being analyzed is dramatically smaller and that is always good.
id: 1
select_type: SIMPLE
table: server_statistics
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 679079
Extra:
—-
id: 1
select_type: SIMPLE
table: server_statistics
type: range
possible_keys: Creation_time_ix
key: Creation_time_ix
key_len: 8
ref: NULL
rows: 339539
Extra: Using where
Now, this was a good step but still we get long execution times. The indexed query was running at 4 min 10.62 sec, the non-indexed query was running at 4 min 12.91 sec. Usually saving two seconds on a query would be great, but the when you’re already waiting four minutes, what’s another two seconds. So a completely different solution would need to be developed. The requirements were looked at and it was immediately obvious that this data did not need to be run on every login, it was overview data and could run once per day and still satisfy the requirements.
Now that I was just going to run it once a day, I needed a way to have that data available from the last time it was run without pulling it from the main server_statistics table. A view table would also not work because MySQL does not support materialized views by default so I would make a table that functioned similarly to a materialized view table. If you’re wondering what a MV is, these are good references that I found just for you, while writing this. Reference 0, Reference 1, and Reference 2.
To simplify writing queries in the future and maintaining clean code in my model files I decided to make a regular view table that would output the necessary data from server_statistics. This view table would be queried on a daily basis at night, when users are unlikely to be using the system, and the output would be inserted into a new table. There are several benefits to this approach.
- The query time is faster (obvious)
- Code is cleaner since we can run “select * from view_analytics_overview;” instead of the very long version listed previously.
- Unexpected benefit is that we get an overview table that can be queried later for easier reporting since it contains overview data that is generated each day – we won’t have to run loops or group-by queries to generate daily overview data when we add monthly reports with data organized by day. Effectively this makes monthly reporting easier down the road.
So the new standard view table is created as “view_analytics_overview” and then I go ahead and make the materialized view table “analytics_overview_data” with the same column names that are outputted from a “select * from view_analytics_overview;”. After that I copy the kontroll-query_cache_preload.pl script (that usually runs every hour to keep the query cache warm), rename it to kontroll-overview_daily.pl and modify it to contain a new query – the one that will be scheduled to run every day at midnight and keep our materialized view up to date. The query for this is as follows:
#grab most recent data from the view table, inset it into the historical overview table for analytics
my $sql9 = "INSERT INTO `analytics_overview_data` (
`max_os_mem_used` ,
`min_os_mem_used` ,
`avg_os_mem_used` ,
`stdev_os_mem_used` ,
`max_size` ,
`min_size` ,
`avg_size` ,
`stdev_size` ,
`max_connections` ,
`min_connections` ,
`avg_connections` ,
`stdev_connections` ,
`max_qps` ,
`min_qps` ,
`avg_qps` ,
`stdev_qps` ,
`Creation_time`
)
SELECT *, NOW() FROM view_analytics_overview";
Now when we run a “select * from analytics_overview_data order by id desc limit 1;” (so we only get the most recent data) we get our data in .01 seconds. That is approximately 25,000 times faster than the original method.
mysql> select * from analytics_overview_data order by id desc limit 1\G
*************************** 1. row ***************************
id: 1
max_os_mem_used: 50.7434883117676
min_os_mem_used: 0.0230369567871094
avg_os_mem_used: 1.54342417729331
stdev_os_mem_used: 2.36530863328196
max_size: 307.816745581105
min_size: 0
avg_size: 12.1319161207811
stdev_size: 43.6928249097012
max_connections: 435
min_connections: 0
avg_connections: 17.9539
stdev_connections: 40.1423
max_qps: 9243.65
min_qps: 0.000114098
avg_qps: 193.852714302991
stdev_qps: 975.674986259363
Creation_time: 2010-01-07 11:19:37
1 row in set (0.01 sec)
So, the lesson is to make use of materialized views when appropriate and not to blame MySQL (or your database of choice) for long query times – sometimes you just need to change your application logic. I do wonder why other RDBMSs support materialized views but MySQL does not offer them. Maybe one day…





5 Responses to “Materialized view makes login process 25k times faster”
January 7th, 2010 saat: 9:07 pm
Nice article showing possible uses of materialized views.
This is a little hack I wrote awhile ago that can do (mostly) atomic materialization of views in MySQL:
CREATE TABLE new_materialized_view SELECT * FROM regular_view;
RENAME TABLE materialized_view = old_materialized_view, new_materialized_view = materialized_view;
DROP TABLE IF EXISTS old_materialized_view;
Unfortunately, MySQL doesn’t support true ACID DDL statements, but RENAME TABLE has some code to rollback the operation if it fails, which is better than nothing.
January 8th, 2010 saat: 3:35 am
Have you looked at FlexViews already?
http://flexviews.sourceforge.net/
Current version: 1.5 Beta
http://sourceforge.net/projects/flexviews/files/
January 8th, 2010 saat: 8:44 am
I did check that out, it looks interesting and quite flexible – One of the reference links I posted in the article is for their project. I only found out about it after I finished coding the MV for Kontrollbase.
January 8th, 2010 saat: 5:59 pm
Hi Darius,
First off MySQL support atomic rename:
rename table a to b, b to c, c to d, d to a;
When you build a mview with flexviews, you can choose incremental maintenance or “complete” maintenance. The latter uses atomic rename to ensure the view is queryable while it refreshes, but then to atomically replace it once the rebuild is completed. The former uses table logs and DML to asynchronously maintain the view.
January 10th, 2010 saat: 3:27 pm
Hi Justin,
The RENAME TABLE code isn’t truly ACID. MySQL doesn’t have anything like the write-ahead log. It is only a hack in the code. Anything further would require a major restructuring of all DDL statements in MySQL.