Difference between revisions of "UMCD"
|  (→Tasks list) |  (→Server design) | ||
| Line 249: | Line 249: | ||
| http://www.hyc.io/wesnoth/doc/class-diagram.png | http://www.hyc.io/wesnoth/doc/class-diagram.png | ||
| + | |||
| + | This class diagram is not a view of *all* the system but only a class subset that represents the best the architecture. All the class related to the event-driven programming are omitted. | ||
| ==Event-driven programming with Boost.Asio== | ==Event-driven programming with Boost.Asio== | ||
Revision as of 05:59, 24 September 2013
Contents
User Made Content Daemon (UMCD)
This page will give information on the UMCD project. It started as a GSoC project, this proposal should be regarded only as an archive.
Installation
Dependencies on Linux
We must first install some required packets:
sudo apt-get install mysql-server unixodbc-dev libmyodbc libboost-iostreams-dev libboost-program-options-dev libboost-regex-dev libboost-system-dev libboost-thread-dev libboost-date-time-dev
- mysql-server is the MySQL database.
- unixodbc is a middleware API to access database (see http://en.wikipedia.org/wiki/ODBC).
- libmyodbc is the MySQL driver to access the MySQL database.
- Boost libraries are used to ease the development.
Database setup
We'll explain this step for the MySQL database but you can use any database that is ODBC compliant.
Initialize the MySQL database
Enter the MySQL prompt with:
mysql -u root -p
Then, we create the database:
mysql> CREATE DATABASE IF NOT EXISTS umcd; mysql> USE umcd;
We create the tables:
 mysql> source {wesnoth-directory}/data/umcd/database/create_database.sql
We populate the database:
 mysql> source {wesnoth-directory}/data/umcd/database/populate_database.sql
You can check that the tables are added with:
mysql> show tables;
Now we quit the mysql prompt:
mysql> exit;
Install the ODBC driver
We must link the database to ODBC, so we'll retrieve the database connexion via a Data source name (DSN).
First we must find the odbc.ini file:
odbcinst -j
In my computer it's in /etc/odbc.ini so I'll refer to this location. We must edit this file:
sudo vim /etc/odbc.ini
The file can be empty, we'll add these data:
; ; odbc.ini configuration for Connector/ODBC ; [ODBC Data Sources] dbumcd = MyODBC Driver DSN for the UMCD database [dbumcd] Driver = /usr/lib/libmyodbc.so Description = Connector/ODBC Driver DSN for the UMCD database SERVER = localhost PORT = USER = Password = Database = umcd OPTION = 3 SOCKET =
The DSN of the umcd database will be "dbumcd". You can change the SERVER and PORT entry if it's an external database. Otherwise, let all the fields with a blank like in the example.
Next we must install the driver with:
odbcinst -f /etc/odbc.ini -d -i
You can list all the ODBC drivers installed with:
odbcinst -s -q
That's all!
Configuration
Want to join the development?
Everyone can join the development of the UMCD, but the most difficult task is to get in. We want to simplify this process and have written some tutorials and articles that will help you to understand the spirit of the code. Of course the Doxygen documentation is the reference and it's where you will search for class details.
Directories
We explain here the meaning of the directories composing the UMCD project.
Data directory
All the WML, text or binary data files related to UMCD are stored in data/umcd/:
| Path (data/umcd/) | Description | 
|---|---|
| . | All the data related to UMCD. | 
| ./database/ | SQL script to generate and populate the database, and everything related to database in general. | 
| ./protocol_schema/ | The WML schema of the protocol, the upload, download, list, ... requests have a specific format. It is the place to "formally" describe it. | 
| ./schema/ | All the WML schema not related to the protocol, for example, it can describes configuration files. | 
| ./schema/types/ | General types schema, for example string or integer. | 
| ./tests/ | WML files related to tests, you can find example of good and bad client requests. | 
Documentation directory
Only one document is in this directory that describes the protocol. However a lot of Doxygen documentation is in the source files and this Wiki have some high level documentation.
| Path (doc) | Description | 
|---|---|
| ./design/umcd/ | There is some Latex files explaining the protocol design of UMCD, this is an important document because it contains the description of the protocol. The Latex root file is doc/design/umcd.tex (the one you need to compile). | 
Source directory
The source of all the cpp files composing the UMCD project is in src/.
umcd
The main files composing the Wesnoth UMCD project are in src/umcd/
| Path (src/umcd/) | Description | 
|---|---|
| . | Files not sorted yet or that have no specific place to go. The main file is umcd.cpp. | 
| ./actions/ | The request-specific handler are stored here, you can find details about what happens when a specific request is received. | 
| ./boost/ | Some workaround of the Boost library, mainly for old version (for example you can add features that weren't there in old version). | 
| ./client/ | These files are specific to the client, it aims to be used in tests and Wesnoth client code. | 
| ./database/ | Database-related files, it contains a connection pool and a query catalog. | 
| ./env/ | The environment of the server, you should always try to keep dependency low with those file because any class directly using the environment won't be re-usable if the environment is not loaded. It's also a bridge between the configuration file and the classes. | 
| ./logger/ | A thread-safe logger with specific formatting. You should use the asio_logger which is designed to work with Boost.Asio, however use it via the predefined macros. | 
| ./otl/ | This is the header only file of the OTL library. Don't touch otlv4.h unless to update it from official source. You can configure the OTL library in the file otl.hpp. | 
| ./pod/ | Auto-generated directory. You will only see it after the first build. It contains all the database tables in POD structure. | 
| ./protocol/ | It contains files highly specialized for our protocol. | 
| ./protocol/client/ | Specialization of the core files for the client side. (Nothing in there yet). | 
| ./protocol/core/ | Generic protocol file for both the client and server side. | 
| ./protocol/server/ | Specialization of the core files for the server side. It also contains the action dispatcher and entry_point (request acceptor). You may want to look at these file to know how a request is dispatched. | 
| ./server/ | Everything in there has no Wesnoth dependencies outside this directory. It's our generic server system. It also permits the transfer of data. | 
| ./server/detail/ | Implementation details of some of the classes. | 
| ./server/multi_threaded/ | The multi-threaded version of the server, configure it with the number of desired threads. | 
| ./server/traits/ | Traits to add information on some types. For example, the event_slot add a slot function type to the event type. They must be specialized. | 
Test
We implemented functional tests and it uses the test files in data/umcd/tests.
| Path | Description | 
|---|---|
| src/tests/umcd/ | Test framework, it's designed to send test files. Data received are validated via a schema. | 
sql2cpp
This is a tool programmed with Boost.Spirit to keep in sync the database schema (in data/umcd/database/) and the classes that model the database table.
| Path (src/tools/code_generator/sql2cpp/) | Description | 
|---|---|
| . | Main directory of the project. The main file is sql2cpp.cpp | 
| ./cpp/ | C++ generator, it generates the POD files and classes from the Abstract Syntactic Tree (AST) created by the SQL parser. | 
| ./sql/ | SQL lexer and parser that create the SQL AST. | 
Server design
The server design is quite fixed since the proposal. Of course a lot of practical details has changed but it stayed conceptually the same. The main difference with typical server architecture you encountered in other language, such as Spring with Java, is that the whole thing is asynchronous. It's actually make a big difference in your way of thinking. You must constantly think: "Is this object, that I've just created, need to be on the heap?". Asynchronous call make the stack less useful and more things must be allocated on the heap. Keep that in mind. It also opens doors for memory (allocator) optimizations.
There are just two basic diagrams to help you understand the architecture. The first is a flow diagram:
 
And the second is a class diagram:
 
This class diagram is not a view of *all* the system but only a class subset that represents the best the architecture. All the class related to the event-driven programming are omitted.
Event-driven programming with Boost.Asio
There is a series of articles on this paradigm, and there are directly related to the classes you will find inside this project.
Part 1: Event-driven programming in C++
Part 2: Event-driven Boost.Asio server
Part 3: Event-driven Boost.Asio client
Part 4: Event-driven Boost.Asio transfer
TODO: Need to archive these articles on the Wesnoth wiki.
Tasks list
Building and installation script
As you noticed, the installation is not as simple as it could be. We would like to launch a simple command, such as make install to launch the process. A good building script would be really nice, I can think of several steps for it:
- Install dependencies (such as unixODBC, Boost, ...).
- Configure and install the database.
- Configure and install the ODBC driver.
- Create and populates the database tables.
- Building the code of the UMCD.
- Building the code of the UMCD tests.
- Test the code.
- Use this script with Travis to automate the building and tests with each commits.
All these little steps should be implemented in different script files if possible. And one command/script should put together all these steps.
There is room for improvement:
- Support multi-platform (especially Windows)
- Support multi-database configuration (you'll need to modify the OTL header too), this can be a bit tricky to automate.
- Whatever you find useful.
Always try to make these scripts generic enough to be re-usable with other server and/or application.
Testing
The testing is not an option and can always be improved, if you spot a bug, add a test. There are some improvements that can be made to the actual "test framework".
- (easy) We currently don't check specific error, it validates even if the server returns a non-expected error. It's because we only check that the response validates the schema of the error. Don't add anything to the protocol to enable this check, use the error catalog and compares the resulting string message.
- (easy) The test are always launched in the same database that is not destroyed at the end of the tests. It should be automated in a script file (create db, launch server, launch test, destroy db).
- (medium) We're explicitly adding test, but it's nearly always a file with a schema (a response and an answer). We should be able to automate everything. Maybe you can describe a WML test format such as:
 test_name="upload UMC update with bad ID"
 [request]
   filename="data/umcd/tests/request_umc_upload/request_umc_upload_bad_id.cfg"
 [/request]
 [reply]
   filename="data/umcd/protocol_schema/error_reply.cfg"
   [error]
     value="bad_umc_id"
   [/error]
 [/reply]
The requests, replies should be read in the order we need them to appear.
Feel free to modify this format and do not forget to test the test (use a schema file to ensure your test file is correct).
- (easy) Add tests, we need a lot of test, for each functionality and for each errors.
Documentation
The documentation is not only a secretary task, some code are involved in these tasks.
- (medium-hard) Generate a visual representation of the database from its schema (or find a tool to do so). You can use the sql2cpp tool and add a part sql2xml if the image-schema generator waits for XML. I think about sqldesigner that output great and colorful schema.
- (medium) Transform the WML schema into a prettier and displayable form.
- (easy) Incorporate the prettier WML schema directly inside the documentation (latex file).
- (easy) Add the documentation building in the build chain. The documentation should rebuild if WML schema are modified.
- (easy) Do something similar for the database schema image.
WML schema
We need WML schema that validates the most of a request.
- (easy) Write WML schema for all the requests and replies.
- (easy-medium) Add language type (this should be an enumeration), possibly auto-generated from the data/languages/ folder.
- (medium-hard) Add regex to check the length of a field, possibly auto-generated (the database need it too, we need to have only one representation of data).
- (medium) Generated the populate schema file from the language and UMC type enumerations. (Also used in the WML schema file).
Database
You can access the database with the library OTL but this one is not really well think for generic programming. We would like:
- (easy) Consider all table as tuple, you can use BOOST_FUSION_ADAPT_STRUCT to adapt the pod generated.
- (medium-hard) Implement database algorithms such as select, insert and delete that operate on these tuples, so we won't need to re-implement it for each classes. The difficulty is to make these algorithm generic enough to accept where, order by, ... and other SQL clauses. You can see what's SOCI offers that would simplify your life. The hardest part is to convince mordante.
There is a lot of design work there and it should be really interesting.
Functionalities
This is the most important in the project, the functionalities, what the server can actually do.
- (easy to hard): Nearly none of the requests (license request and a bit of the upload request) are actually implemented, but this doesn't mean that nothing is done. The design is in a good state and a lot of tools are available to re-use. In the section Event-driven programming, you can find a series of articles explaining facilities to send and receive data. It's easy if you only go to the simple side of the request (and you should do that first). But it can become harder if you implement new generic facilities (as the one's we speak in the database section).
- (easy): Create errors for each database errors that can occurs, add to them explicit messages. Add a test for each database error that can be created.
- (probably hard): Add allocator system to well-chosen class, a first goal could be to allocate a chunk of memory for each client that connect. The size of the minimal memory block needed can be calculated but that wouldn't be generic. There is some resources such as in the book Modern C++ design - Chapter 4 - small object allocation.
- (medium): It could be useful to administrate the server with specific administration packets, what is nice is the fact that every data input comes from a same entry (the network here). You can also bind signal to specific event, it's up to you. The goal is to control the server from the server prompt or even from outside.
- (medium): Secure the transmission and more particularly when the passwords is transferred. This is one of the reason we extracted the pbl file inside the protocol itself and not in the files package. Try to open secure transmissions only with headers and to retrieve/send binary data in "normal mode".
Environment
The environment is really important and actually quite hard to implement properly. Typically the environment can be accessed from a lot of classes, and it makes think of a global variable. Of course you can put it in a singleton, it's still global and do not resolve the problem. But what is the problem?
The problem of a global environment is that your classes inevitably become tightly coupled with it and they can't be re-usable anymore. For example, I first used the environment in the header_mutable_buffer class to retrieve the maximum size of a header. It makes this class impossible (or at least harder) to re-use in the client code.
The first approach (and the currently implemented one's) is to consider each category of environment variable and aggregate these inside a common structure. For example you have the database variables, protocol, server, ... Each category in fact a mono-state class. A mono-state is a class with static field that you access like if it was non-static. So it's feel like you don't manipulate a global variable, but it's still one. We need to reduce the number of class that instantiate these mono-state category.
A way to do that is shown with the mono-state protocol_info and the class header_mutable_buffer. We use event to bind the static method set_header_max_size to the value store in protocol_info. Actually this is just a temporary hack to reduce coupling because I needed to use this class in the client (without loading an environment). By the way, the environment is loading by the environment_loader class and a friendship system allow it to access setter (but not private variable). So only this class can modify the state of the mono-state.
Finally, what would be great is to design a system such that none of our class directly use the environment classes. They should be automatically updated whenever the environment change, probably with the event-driven programming we already introduced before.
- (easy) Add event system to each classes for each field and bind them to the environment (as with the set_header_max_size() method). As a result, none of the class should use anymore the environment.
- (medium - hard) Design a generic environment system to allow any user to add new categories without repeating all the code (as suggested in the first solution). A possible way could be to use tuple and Boost.Fusion with some smart macros that generates what should be generated. Also give a look to BOOST_FUSION_ADAPT_STRUCT.
- (hard) Allow to bind a method/function with arbitrary number of category field, example:
struct protocol_info{ int a; std::string b; float c; };
function_callback(int, std::string);
// boilerplate code...
protocol_info info;
info.add_event<fields<1, 2> >(function_callback);
Or also to adapt a tuple inside another struct such as:
struct my_protocol_info{ float a; std::string b; };
function_callback(my_protocol_info);
// boilerplate code...
protocol_info info;
info.add_event<as<my_protocol_info, 3, 2> >(function_callback);
Of course all of this is added "on-the-fly". Feel free to modify interfaces, reduce the problem or augment it with your own idea.
External features
By external I mean the features that are not UMCD code but are Wesnoth code that the UMCD project uses.
- (medium) Check out why the requested field in the schema are ignored. Is it a bug or a misused?
- (medium to hard) Correct the memory errors generated by Valgrind when we validate a config file with the validate method of the schema_validator class. There is a lot.
- (medium) The schema_validator returns message-based exceptions, it could be useful to have structure and enumeration describing the errors and let the user of the class to format them.
C++ code generator from SQL schema
- Where? {wesnoth-source-tree}/src/tools/code_generator/sql2cpp/
- Ressources?
The classes that contains the database table are automatically generated from the SQL schema. So we only have one representation of our database and we are automatically in synchronization.
This is an aside project, not related to Wesnoth code, and I wish to keep this independence. There is a big room for improvement:
- (easy-medium) The code doesn't support all the SQL grammar, of course it shouldn't, but a lot of keywords (not related to the generation) are missing, such as the index keywords. Improve the grammar!
- (medium) The SQL parser is specific to the MySQL dialect, we should use a base class for standard SQL dialect and inherits from this parser to handle other dialect such as the MySQL, Oracle, ... For example the keyword AUTO_INCREMENT isn't standard.
Improve the code generation:
- (medium) Add CRUD operations inside the POD classes.
- (hard) Maybe use Boost.Fusion to make any SQL statements generic (See generic features section).
- (medium) Add an option for the underlying database access library, here it's OTL, but it could be others such as SOCI.
Some simple improvements:
- (easy) Add a namespace options, so we can generate the code in a custom namespace, try to allow nested namespace (such as umcd::pod).
- (easy) Add message when the parsing fails, currently, we need to activate the debug macro, this isn't good for simple SQL schema error.
FAQ
Q: Why not using the Boost.Asio stream?
Beside the fact that the stream are easy to use and quite powerful they present a major drawback: they are synchronous. Making them asynchronous is possible with awful hack but the beauty of the streams then disappear.