Support This Project

Backend Rewrite

Aim

Rewrite the backend system of wzdftpd so that the following can be achieved:

  • Generic storage layer for interfacing between SQL/plaintext/etc backends and a stable data storage API
  • Plugins can store data can on files/users/groups without having to do all the hard work themselves
  • More choice and customization in where data is stored (example: .message files in an SQL backend?)
  • Ability to share configuration between two or more wzdftpd servers
  • Increase code reuse, decrease duplication of code (benefits: simpler/more secure/less bug prone)
  • Full UTF8 support (see #82)

Existing ideas/pointers on this topic

  • libconfig - a library to store configuration and user/group structs into plaintext files
  • YAML - human-readable data serialization format
  • XML - general-purpose markup language
  • libprf - Preferences Registry Format configuration file access library
  • libelektra - universal hierarchical configuration store
  • GConf - system for storing application preferences

Architecture

The design of the new backend system is such that all caching/queuing/syncing is managed automatically by the backend system, and not at a higher level in the code. Performance is extremely important; reducing the amount of times the backend system needs to read/write from the disk/database is essential. Meanwhile, multiple instances of the wzdftpd server sharing the same backend data need to work side by side without any race conditions/data corruption. The aim is to make the backend system do all the hard work, with a very simple and powerful API for the wzdftpd core and modules to work with.

Layers

The backend system can be thought of in a few sets of simple layers. In order from lowest layer to the highest layer (using MySQL as an example):

  • MySQL connector
  • Generic SQL layer
  • Backend system API

This should help minimize the need to rewrite SQL related code for each and every SQL database backend (MySQL, PostgreSQL, SQLite, MSSQL, etc). The lowest connector layer in this example would simply act as a wrapper between generic functions such as connect_to_sql_database() and the corresponding functions in the database specific libraries. There shouldn't be any need to have 4 copies of a function that creates the required SQL queries to read database values.

Caching

Data needs to be cached in memory for a certain amount of time before being reread from the backend. This will help increase performance and reduce queries to the backend system. The cache expiry can be set individually for each backend loaded. As an example, this could allow for wzdftpd to only write data to the disk (plaintext backend) every ~5 minutes instead of every time the credits counter is updated.

This gets a lot more complicated when multiple servers are in use at one time and need to be in sync. If a user is deleted from the first server, the changes need to be synced with the backend and loaded by the second server. During this time, the deleted user can still login to the second server. It may therefore be best to introduce a "recent changes queue" to each backend which stores a list of all the recent changes. Instead of querying the backend and blindly updating all user configuration every 5 minutes, the server would only need to query the "recent changes queue", which is far more efficient. From here, the queue is parsed by the server and only relevant changes to user configuration are loaded from the backend.

In particular, there are issues with counters that change often (they are hard to sync between servers). To get around this, each server can have a relative counter value (the amount of change to the counter in the last cache expiry period). When the cache timer expires, the counter change amount is added/subtracted to the value stored in the backend (in a single atomic operation).

Data structures

Data stored on a user/group (or other object) should be highly flexible and adjustable at runtime (a module may want to create a new field of data on a user). The size and type of data stored on a user shouldn't be hardcoded. The use of associative arrays may be of assistance here. It should also be possible to nest data structures many levels deep. Algorithms need to be studied (hashtable vs self-balancing binary search tree, etc) to determine how feasible this requirement is. Performance and the Big-O time of chosen algorithms are of critical importance.

An example data structure may take the following form:

/users/john/fullname
/users/john/password
/users/john/publickeys/1
/users/john/publickeys/2
/users/john/publickeys/3
/users/john/stats/global/2007/january/filesuploaded
/users/john/stats/global/2007/january/filesdownloaded
/users/john/stats/global/2007/march/bytesuploaded
/users/john/stats/section1/alltime/fxpoutbytes
/users/john/stats/section1/alltime/fxpinbytes
/users/john/credits/global
/users/john/credits/section1
/users/john/credits/section2
/users/john/ratio/global

Note that the statistics in this example are deliberately detailed to show how flexible the data structure could be.

This example brings up an interesting issue of whether hierarchical data structures are the best way to represent the data. Why store statistics by date and then by type rather than the other way around? An example of such a dilemma may be 2007/january/filesuploaded vs filesuploaded/2007/january. If we want to work out how many files were uploaded and downloaded in January 2007, storing data in the 2007/january/filesuploaded is most appropriate. If we want to determine how many files have been uploaded in all time, the filesuploaded/2007/january format is much more appropriate.

Example API

This is just a very rough idea of some possible API ideas for the new backend system. Note that the API is very generic in nature, allowing for handling of everything from user/group structs through to server configuration and .message files.

int backend_get_data(backend * source, char * key, char * result)

Retrieves a value from the specified backend, allocates a new string and returns 0 on success, or -1 on failure.

int backend_set_data(backend * source, char * key, char * value)

Stores a value to the specified backend and returns 0 on success or -1 on failure.