CDMP 2006


Contact Me

Sanaz Motahari-Asl

University of Toronto Engineering Science Department

sanaz.motahari.asl@
utoronto.ca


Journal!

Week One | Week Two | Week Three | Week Four | Week Five | Week Six | Week Seven | Week Eight | Week Nine | Week Ten | Week Eleven | Week Twelve | Week Thirteen | Week Fourteen | Week Fifteen | Week Sixteen

Week One May 8 - May 14

This week I met with Prof. Amza to discuss the project I will be working on during the summer. You can read more about the details of my project here. I also met the Masters and PhD students in the lab and discussed parts of the project with them. I read a paper very much related to what I will work on during the summer.


Week Two May 15 - May 21

This week I learned about network programming; mainly reading about various function calls which allows for client server connection between two machines. I also started working on a boss/worker threading scheme which allows a server to make use of a thread pool to handle connections from various clients. Each of these threads will use an even-driven technique to handle the tasks requested from the client.


Week Three May 22 - May 28

This week i finished implementing the thread pool which is to handle a large number of connections from the clients. I used a signaling scheme to wake a specific thread when there are connections to handle. I also made use of a simple load balancing scheme in order to balance the load equally across the threads in the thread pool.


Week Four May 29 - June 4

This week i worked on modifying the thread pool to be more dynamic. When there is a lot of demand, threads are added dynamically to the thread pool to handle the extra connections. Once not needed, these threads are removed.


Week Five June 5- June 11

This week i researched on two different approaches for the rest of the project. One was to use ODBC: To install an ODBC manager on the scheduler and ODBC drivers on the database machines. Throught the use of ODBC the scheduler will be able to connect to any database, regardless of the DBMS, which would allow for more portability. The second approac was to use writesets: Instead of the currently employed read-one-write-all method in the cluster, the updates will be done on a master database, a writeset indicating the rows changed in the particular table and the updated values will be extracted, this writeset will be applied on the other slave databases. This approach allows for increased performance. In the end, after discussing the two methods with Prof. Amze, I decided to go forward with the writeset method.


Week Six June 12- June 18

This week I set up a MySQL database as well as an application server backend using Apache and PHP. i also utalized an e-commerce benchmark namely TPCW. This benchmark has been developed and used in Prof Amza's group before to analyze the performance of various programs.


Week Seven June 19- June 25

This week I merged my code for the threadpool with that of the scheduler. Later this week and next week I will be running tests to make sure no complications arise.


Week Eight June 26- July 2

This week I became familiar with the TPCW benchmark and the 3 tier PHP/Apache - Scheduler - database architecture. I ran my scheduler code and checked for bugs.


Week Nine July3- July 9

This week I researched some more on the writeset method. After finding out about the new "row-based replication" in the latest version of MySQL, as well as the added feature of being able to use triggers, I finalized my design for extracting a writeset. Since the current scheduler code is strictly dependent on the version of MySQL used in the system, I will use both writeset extraction and ODBC to be able to increase performance as well as making the system independent of the database used.


Week Ten July10- July 16

This week I started to work on extracting the writeset. I will start with a "single master - multiple slave" scenario. The master will accept write queries from the clients and logs them to disk. Every log will contain a transaction which will be replicated to the slaves. The slaves will receive these logs and execute them on their local databases. In order to maintain the order of these logs, the master will request version-numbers from a sequencer. The slaves will check these version-numbers in order to make sure they execute the logs sequentially. Since these version numbers are global, adding a second master to the cluster will not affect the total order of the logs received by the slaves. After this stage, the version number should be used to ensure that the read queries received from the clients on the slaves are executed on the latest version of the database.


Week Eleven July 17- July 23

This week I dug into the MySQL library code, searching for the files and the methods I need to change in order to
1) close every log after a query is executed on the master
2) insert a version number in the every log written on the master
3) include code on the slave so that it understand what these version numbers mean, and maintain the total order


Week Twelve July24- July 30

This week I modified the MySQL library code so that the binary log on the Master will close after executing each query from the client. Each log includes all the rows that were changed in a specific SQL write query (inserts, deletes, updates).


Week Thirteen July 31- August 6

This past week I looked at the MySQL code to find the best way to implement a tagging scheme for the logs produced by the master. Some of the different methods I explored included changing the header of the log being sent to the slave or including the sequence number as an "event" at the beginning of every log.


Week Fourteen August 7- August 13

This week I started implementing the sequence numbering scheme as an event in the log file.


Week Fifteen August 14- August 20

This week I finished working on the sequence numbering scheme. The problem at this point is to modify the MySQL code on the slave side so that it identifies the sequence-event and keeps a local counter in order to make sure that the logs aren't processed out of order. If a log is out of orer, it is not executed until the missing logs are received.


Week Sixteen August 21- August 27

This week I started implementing the slave side code to make sense of the sequence-event inserted by the master.

*This is an ongoing project. For a more comprehensive summary refer to the final report.