Quantcast
Channel: Oracle Bloggers
Viewing all articles
Browse latest Browse all 19780

Truly Parallel backup (MySQL Enterprise Backup 3.8 and later)

$
0
0

How do you implement a parallelism paradigm for a software which needs to be streamed to tapes?

How do you ensure that you have the capability to be able to tune your parallelism for varying input and output devices and varying levels of computation?

These were some of the questions that we needed to answer when we were trying to implement multi-threading for MySQL Enterprise Backup (or MEB as it is called).

The trivial way of parallelism is by having the multiple threads pick up the different files (in a file per table) scenario. But this did not seem adequate because:

a) The sizes of these files (corresponding to the tables) could be different and then 1 large file would limit the parallelism since it would be processed by a single thread.

b) If you have to stream the backup how do you reconcile these multiple files being streamed by separate threads. Large backups are directly streamed to tape so it is better to have a single file being output and not multiple files.

c) If you buffer each file and wait for a file to be completely processed and then push it to tape then it is not true streaming because you are using intermediate disk space to save the incomplete portions of all the files.

The answer that we found was to parallelize horizontally instead of vertically.


In the Vertical Parallel architecture each thread acts on a separate file which has limitations when the file sizes are varying and for streaming.

In a Horizontal parallel architecture each file is broken into a set of subsections (denoted by multiple colors). The threads are able to act on each subsection independently.

Parallel operations are then possible for reading , processing and writing of these file subsections.

This setup is especially useful when using compression since there can be multiple threads performing compression while the read and write continues in parallel.

There may be additional overhead of ensuring that the buffers are in the correct order when they are written out, but since most of the buffers of the same size and having similar operations being performed, this overhead is not too large.

You get truly serialized output which can be streamed to tape as it gets processed. If you are streaming to a remote host or to tape, there is almost no additional space required on your main server.

While using parallel backup in certain scenarios we are able to get almost 10 times the speed of a normal backup.

The Graph below shows the time it took for backup for MEB 3.7.1 v/s MEB 3.8 when different number of threads were configured.



Note : This is a 16 GB, 2 x 2000 MHz, 2 RAID DISKS (1027 GB,733.9GB) machine running Oracle Linux.

As you can see above; MEB 3.8 provides options to configure the number of threads you use for reading, writing and processing. Lets denote RT, PT and WT as number of Read, Process and Write threads respectively. Default values for MEB 3.8 is RT=3,PT=3, WT=3 which is changing in MEB 3.8.1 to RT=1, PT=6, WT=1.

This is close to the fastest backup we get in the graph above. The reason for not choosing RT=1, PT=12, WT=1 (which is the fastest) is because the CPU gets very highly utilized in the 1,12,1 configuration.

Remember, the read write throughput depends on your input and output devices. It is possible that multiple threads do not give you a better performance for read or write v/s a single thread.

There are also options available to have a configurable number of buffers used by these threads.

Each buffer is of size 16MB. You should have at-least [RT+PT+WT+ (MAX(RT,PT,WT) ] number of buffers so that you get optimal parallelism.

For Example if RT=1, PT=6, WT=1 then you should configure 1+6+1+6 = 14 buffers (default in MEB 3.8.1)

If for example you configure multiple threads but configure only 1 buffer then your backup is not taking advantage of parallelism at all. The read thread reads into the single buffer, buffer is then processed, written and then freed. The read thread is waiting for a buffer to be free to read into it; so it is like a serial process.

One more thing to note is that the number of buffers is limited by the memory limit configured for backup (default 300MB). Please ensure that you configure enough memory to be able to distribute it to the buffers you have configured. If the memory limit configured is less then what is required for the configured number of buffers; MEB will automatically decrease the number of buffers to fit into the memory limit. Based on the default values, if you are configuring more than 18 buffers you will need to increase the memory limit.

Please look at the previous 3.8 blog for detailed configuration examples :

https://blogs.oracle.com/mysqlenterprisebackup/entry/parallel_backup_in_mysql_enterprise

or into our documentation of this feature at

http://dev.mysql.com/doc/mysql-enterprise-backup/3.8/en/backup-capacity-options.html

Cheers 

and remember the wise DBA advise:

If you don't verify your backups periodically it is like not having backups at all



Viewing all articles
Browse latest Browse all 19780

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>