documentation update

- one common installtion itruction for whole HAF - sql_serializer parameter description - cmake targets types - sql_serialzier: dumping blocks to the datatabase

documentation update
be0c377d · Marcin · 5c24f2af · be0c377d · be0c377d · be0c377d
Commit be0c377d authored 3 years ago by Marcin
--- a/README.md
+++ b/README.md
@@ -2,24 +2,43 @@

 Contains the implementation of Hive Application Framework which encompass hive node plugin and Postgres specific tools providing
 functionalities required by other projects storing blockchain data in the Postrges database.
-
-# Compilation
-## Requirements
+The HAF works between the HIVE network and the applications
+
+![alt text](./doc/c2_haf.png)
+
+The HAF contains a few components visible at the picture above:
+* **HIVED - hive node**
+  Regular HIVE node which syncs blocks with the HIVE network or replays them from block.log file.
+* **SQL_SERIALIZER**
+  A hived's plugin (the hive node plugin) which during syncing a new block pushes its data to SQL database. Moreover, the plugin informs the database about the occurrence of micro-fork and changing a block status from reversible to irreversible.
+  A detailed documentation for sql_serializer is here: [src/sql_serializer/README.md](./src/sql_serializer/README.md)
+* **PostgreSQL database**
+  The database contains the blockchain blocks data in form of filled SQL tables, and the applications tables. The system utilizes Postgres authentication and authorization mechanisms.
+* **HIVE FORK MANAGER**
+  The PostgreSQL extension provides the HAF's API - a set of SQL functions that are used by the application to get blocks data. The extension controls the process by which applications consume blocks and ensures that applications cannot corrupt each other. The HIVE FORK MANAGER is responsible for rewind the applications tables changes in case of micro-fork occurrence. The extension defines the format of blocks data saved in the database. The SQL_SERIALIZER dumps blocks to the tables defined by HIVE FORK MANAGER.
+  A detailed documentation for hive_fork_manager is here: [src/hive_fork_manager/Readme.md](./src/hive_fork_manager/Readme.md)
+
+# Requirements
+## Environment
 1. Tested on Ubuntu 20.04
 2. postgresql server dev package: `sudo apt-get install postgresql-dev-12`
 3. ssl dev package:               `sudo apt-get install libssl-dev`
 4. readline dev package:          `sudo apt-get install libreadline-dev`
 5. pqxx dev package:              `sudo apt-get install libpqxx-dev`

-## CMake and make
-This will build all the targets from the HAF repository and `hived` program from submodule `hive`. You can pass
+## PostgreSQL cluster
+The project is intended to run on postgres version 12 or higher, however it is possible to use it
+on older verisions but without any guaranteens.
+
+# Build
+CMake and make is used to build the project. Procedure presented below will build all the targets from the HAF repository and `hived` program from submodule `hive`. You can pass
 the same CMake parameters which are used to compile hived project ( for example: -DCLEAR_VOTES=ON -DBUILD_HIVE_TESTNET=OFF -DHIVE_LINT=OFF).

-2. `git submodule update --init --recursive`
-3. create build directory, for exemple in sources root: `mkdir build`
-4. `cd build`
-5. `cmake -DCMAKE_BUILD_TYPE=Release ..`
-6. `make`
+1. `git submodule update --init --recursive`
+2. create build directory, for example in sources root: `mkdir build`
+3. `cd build`
+4. `cmake -DCMAKE_BUILD_TYPE=Release ..`
+5. `make`

 ### Choose version of the Postgres to compile with
 CMake variable `POSTGRES_INSTALLATION_DIR` is used to point the installation folder
@@ -30,24 +49,70 @@ is installed on Ubuntu. An example of choosing different version of Postgres:
 3. `cmake -DPOSTGRES_INSTALLATION_DIR=/usr/lib/postgresql/10/bin ..`
 4. `make`

-# Tests
-The project uses ctest to start tests, just execute in build directory `make test`
-
-Test are grouped in a tree by names and `.` as a branch separator where 'test' is the root.
-For example You can start all unit tests with command `ctest -R test.functional.*` 
-
 # Installation
-Postgres plugins has to be copied into postgres `$libdir/plugins directory`
-
-You can check postgres `$libdir` directory with: `pg_config --pkglibdir`
+## 1. Configure PostgreSQL cluster
+Compiled PostgreSQL plugins and extensions have to be installed in the cluster. The best method
+to do this is execute in the build directory (may require root privilieges):
+- `make install`
+
+This will copy plugins to the Postgres cluster `$libdir/plugins` directory and exstensions to
+`<postgres_shared_dir>/extension`.
+
+You can check the `$libdir` with command: `pg_config --pkglibdir`, and the shared dir with `pg_config --sharedir`
+
+### - Authorization
+It is required to configure two based roles:
+```
+CREATE ROLE hived_group WITH NOLOGIN;
+CREATE ROLE hive_applications_group WITH NOLOGIN;
+```
+The HAF will grant to them access to its internal elements in a way which gurantees security for the application data
+and applications execution flows.
+The maintainer of the PostgreSQL cluster server needs to create roles ( users ) which inherits from one of these groups for example:
+```
+   CREATE ROLE hived LOGIN PASSWORD 'hivedpass' INHERIT IN ROLE hived_group;
+   CREATE ROLE application LOGIN PASSWORD 'applicationpass' INHERIT IN ROLE hive_applications_group;
+```
+The roles which inherits from `hived_groups` must be used by `sql_serializer` process to login into the database
+, roles which inherit from `hive_application_group` shall be used by the applications.
+Each application role does not have access to internal data created by other application roles and cannot
+modify data modified by the 'hived'. 'Hived' roles cannot modify the applications data.
+
+More about roles in PostgreSQL documentaion: [CREATE ROLE](https://www.postgresql.org/docs/10/sql-createrole.html) 
+
+Note: whenever you build a new version of the extensions, you have to create a new database.
+There is no way currently to upgrade the schema installed in your old HAF database.

-The best option is to execute `make install` from build directory (may required to have root privileges)
+## 2. Preparing a PostgreSQL database
+The newly create database have to have created hive_fork_manager extension. Without this 'sql_serializer'
+won't connect the hived node with database. To start using the extension in a database, execute psql
+command: `CREATE EXTENSION hive_fork_manager CASCADE;`. The CASCADE phrase is needed, to automatically install extensions the hive_fork_manager depends on.

-Note: whenever you build a new version of the hive_fork_marnager extension, you have to create a new database.
-There is no way currently to upgrade the schema installed in your old HAF database.
+The database should use parameters
+ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' and LC_CTYPE = 'en_US.UTF-8' (this is default for american english locale,
+it's not tested on other locale configurations).

-# Architecture
-## Directory structure
+# Tests
+## 1. Integrations ```tests/integrations```
+Integrations tests are tests that are running on a module of the project or on a system of the projects modules.
+The tests do not use mock-ups to run modules/system under tests in isolation from their environment, instead
+they are **integrated** with the environment, call real OS API functions, cooperate with real working servers, clients applications or databases.
+### a) Functional tests ```tests/integrations/functional```
+Functional tests are concentrated on tests functions of one module, they test its interface. The tests call
+the functions and check results. 
+The project uses ctest to start functional tests. Tests are grouped in a tree by names and `.` as a branch separator where 'test' is the root.
+For example You can start all the functional tests with command `ctest -R test.functional.*`
+### b) Replay tests ```tests/integrations/replay```
+The test validates if a module or a system under test works correctly during and after replaying the blockchain from block_log file.
+The tests are written with python and pytest is used as the test framework. 
+### c) System tests ```tests/integrations/system```
+The tests check interactions between the projects modules.
+The tests are written with python and pytest is used as the test framework.
+## 2. Unit ```tests/unit```
+Unit tests are used to test parts of modules in isolation from the environment, it means **all** the functions
+called by the unit under test, which are not part of the unit, are mocked and their results are fully controlled by the test framework.
+
+# Directory structure
   ```
   cmake                         Contains common functions used by cmake build
   common_includes
@@ -55,12 +120,15 @@ There is no way currently to upgrade the schema installed in your old HAF databa
   doc                           Contains documentation
   hive                          Submodule of hive project: https://gitlab.syncad.com/hive/hive
   src                           Contains sources
+        applications             Contains utilities which help to develop HIVE applications based on HAF
+        hive_fork_manager        Contains SQL extension which implements solution for hive forks
        sql_serializer           C++ hived plugin which is compiled tohether with hived
        transaction_controllers  library with C++ utilities to controll Postgres transactions 
-        hive_fork_manager        Contains SQL extension which implements solution for hive forks 
   tests                         Contains test
        integration              Folder for non-unit tests like functional or system tests
          functional             Contains functional tests
+          replay                 Tests which checks replayin HAF from block_log file
+          system                 Tests which chechc interactions between hived internals, sql_serializer, hive_fork_mnager and an application
        unit                     Contains unit tests and mocks
            mockups              Contains mocks 
   ```
@@ -68,10 +136,48 @@ There is no way currently to upgrade the schema installed in your old HAF databa
 There is also a `generated` directory inside the build directory. It contains autmatically generated headers which can be included
 in the code whith ```#include "gen/header_file_name.hpp"```

-## PSQL extension based on sql script
-If there is a need to create psql extension ( to use CREATE EXTENSION psql command ) a cmake macro is added:
+# Predefined cmake targetes
+To simplify adding new modules to the project the build system introduces macros which defines few types of project items. 
+
+## 1. Static C++ library
+To setup compiler and linker setting to generate static library use macro:
+
+`ADD_STATIC_LIB` with parameter
+- target_name - name of the static lib target
+
+The macro adds all *.cpp files from the directory in which the `CMakeLists.txt` file is placed ( `${CMAKE_CURRENT_SOURCE_DIR}` ) 
+
+## 2. Run-time loaded C++ library
+To setup compiler and linker setting to generate dynamicaly loaded library which will be opened
+during program run-time with dlopen please use macro:
+
+`ADD_RUNTIME_LOADED_LIB` with parameter
+- target_name - name of the library target
+
+The macro adds to compilation all *.cpp files from the directory in which the `CMakeLists.txt` file is placed ( `${CMAKE_CURRENT_SOURCE_DIR}` )
+
+## 3. Load-time loaded C++ library
+To setup compiler and linker setting to generate dynamicaly loaded library which will be loaded
+by the loader during startin a program please use macro:
+
+`ADD_LOADTIME_LOADED_LIB` with parameter
+- target_name - name of the library target
+
+The macro adds to compilation all *.cpp files from the directory in which the `CMakeLists.txt` file is placed ( `${CMAKE_CURRENT_SOURCE_DIR}` )
+
+## 4. GTest unit test target
+To add unit test based on gtest and gmoc frameworks pleas use a macro
+
+`ADD_UNIT_TESTS` wit parameter
+- module_name - name of test module
+
+The macro adds to compilation all *.cpp files from the directory in which the `CMakeLists.txt` file is placed ( `${CMAKE_CURRENT_SOURCE_DIR}` ).
+The test `test.unit.<module_name>` is added to ctest.
+
+## 5. PSQL extension based on sql script
+If there is a need to create psql extension ( to use CREATE EXTENSION psql command ) a cmake macro is added to cmake:
 `ADD_PSQL_EXTENSION` with parameters:
- NAME - name of extension, in current source directory file <name>.control (see https://www.postgresql.org/docs/10/extend-extensions.html#id-1.8.3.18.11 ) 
+- NAME - name of extension, in current source directory file <name>.control (see https://www.postgresql.org/docs/12/extend-extensions.html#id-1.8.3.18.11 ) 
 - SOURCES - list of sql scripts, the order of the files is important since they are compiled into one sql script

 The macro creates a new target extension.<name_of_extension>. The command 'make extension.<name_of_extension>' will create
@@ -85,4 +191,5 @@ in separated build directory with making the only one target, for example: `make
 Postgres extensions are versioned - extension control file contains `default_version` configuration entry. The build system
 fills the entry with the repository git sha.
 Also corresponding sql script file is named with the same version, as is required by the postgres.
+
 # Known problems
--- a/doc/c2_haf.png
+++ b/doc/c2_haf.png
--- a/src/hive_fork_manager/Readme.md
+++ b/src/hive_fork_manager/Readme.md
 # HIVE_FORK_MANAGER
 The fork manager is composed of SQL scripts to create a Postgres extension that provides an API that simplifies reverting application data when a fork switch occurs on the Hive blockchain.

-## Requirements for postgres
-Extension is intended to run on postgres version 12 or higher, database used with extension should use parameters
-ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' and LC_CTYPE = 'en_US.UTF-8' (this is default for american english locale,
-it's not tested on other locale configurations).
-
 ## Installation
 It is possible to install the fork manager in two forms - as a regular Postgres extension or as a simple set of tables and functions.

@@ -15,29 +10,12 @@ It is possible to install the fork manager in two forms - as a regular Postgres
 3. `make extension.hive_fork_manager`
 4. `make install`

-The extension will be installed in the directory `<postgres_shareddir>/extension`. You can check the directory with `pg_config --sharedir`.
-
 To start using the extension in a database, execute psql command: `CREATE EXTENSION hive_fork_manager CASCADE;`. The CASCADE phrase is needed, to automatically install extensions the hive_fork_manager depends on.

 ### Alternatively, you can manually execute the SQL scripts to directly install the fork manager
 The required ordering of the sql scripts is included in the cmake file [src/hive_fork_manager/CMakeLists.txt](./CMakeLists.txt).
 Execute each script one-by-one with `psql` as in this example: `psql -d my_db_name -a -f  context_rewind/data_schema.sql`

-### Authorization
-During its creation the extension introduces two new roles (groups): `hived_group` and `hive_applications_group`. The maintainer of
-the PostgreSQL cluster server needs to create roles ( users ) which inherits from one of these groups.
-```
-   CREATE ROLE hived LOGIN PASSWORD 'hivedpass' INHERIT IN ROLE hived_group;
-   CREATE ROLE application LOGIN PASSWORD 'applicationpass' INHERIT IN ROLE hive_applications_group;
-```
-The roles which inherits
-from `hived_groups` must be used by `hived` process to login into the database, roles which inherit from `hive_application_group` shall
-be used by the applications. Each application role does not have access to internal data created by other application roles and cannot
-modify data modified by the 'hived'. 'Hived' roles cannot modify the applications data.
-
-More about roles in PostgreSQL documentaion: [CREATE ROLE](https://www.postgresql.org/docs/10/sql-createrole.html)
-
-
 ## Architecture
 All elements of the fork manager are placed in a schema called 'hive'.


--- a/src/sql_serializer/README.md
+++ b/src/sql_serializer/README.md
-# SQL serializer plugin
+# SQL_SERIALIZER
+It is a hived plugin which is resposible to dump blocks data to hive_fork_manager
+and informs it about important events occurence in the node for example a micro-fork occurence.

-## Before cmake
+## Build
+As other hived plugins also sql_serializer is compiled during compiling the hived program.
+There is a trick which allows for this: cmake scripts creates a symbolic
+link to sql_serializer sources in `hive/libraries/plugins`, then cmake script of hive
+submodule find the plugins sources together with sql_serializer - it follows the symbolic link.

-	sudo apt-get install libpqxx-dev -y
-
-## Basic setup
-
-Add this to your `config.ini`, with proper data
+## Setup
+You need to add plugin to the hived node config.ini file:

 ```
 plugin = sql_serializer
@@ -19,6 +22,90 @@ psql-enable-accounts-dump = true
 psql-force-open-inconsistent = false
 ```

-### Example command
+## Parameters
+The sql_serializer extend hived about new parameters:
+* **psql-url** contains line of parameters which are used to connect to the database
+    - *dbname* name of the database on PostgreSQL cluster
+    - *user* a Postgres role name used to connect to the database
+    - *hostaddr* an internet address of the PostgreSQL cluster
+    - *port* a TCP port on wich the PostgreSQL cluster is listening
+
+  ```
+  Example:
+  psql-url = dbname=block_log user=postgres password=pass hostaddr=127.0.0.1 port=5432
+  ```
+* **psql-index-threshold** [default: 1'000'000] an integer number that represents the limit of blocks which allows continuing synchronizing blocks after the node restart without disabling SQL indexes. During massive synchronization
+  (for example, using block_log) inserting a large number of blocks of data will be drastically slowed down by the indexes, so it
+  is good to remove them, but removed indexes have to be recreated before the live sync start (otherwise HAF will be too slow for applications).
+  Recreating indexes lasts a lot of time (depending on the number of blocks in the database) and there is a need to find a trade-off: is
+  better to sync slowly not a large number of blocks with indexes and avoid delay for re-creating them, or is it better
+  to faster synchronize a large number of blocks and deal with the delay. psql-index-threshold is the limit of a number of blocks that will
+  be synchronized slowly with enabled indexes.
+* **psql-operations-threads-number**[default: 5] a number of threads which are used to dump blockchain operations to the database. Operations
+  are the biggest part of the block's data. because there is a large number of operations to sync. The operations are goruped
+  to packages which are dumped conurently to the database.
+* **psql-transactions-threads-number**[default: 2] a number of threads used to dump transaction to the database
+* **psql-account-operations-threads-number**[default: 2] a number of threads used to dump account operations to the database
+* **psql-enable-accounts-dump**[default: true] a boolen value, if true account and account operations will be dumped during the blocks synchronization
+* **psql-force-open-inconsistent**[default: false] a boolean value, if true the plugin will connect to the database even it is in inconsitent state.
+  During syncing blocks it may happen that the hived will crash, and the irreversible part of blocks data in the database may stay inconsistent because
+  threads which dumped blocks were brutally broken. The HAF database contains information about inconsistency of the data, and during restart of the
+  hived sql_serialzier will fail. The Hive Fork Manager has functionallity that repairs the datatbase, but it may last very long time and thus it is required
+  to explictly force open the database and start the rescue action by using switch: `-psql-force-open-inconsistent=true`
+
+### Example hived command
+
+	./hived --replay-blockchain --stop-replay-at-block 5000000 --exit-after-replay -d ../../../datadir --force-replay --psql-index-threshold 65432
+
+## Synchronization blocks process
+The sql_serializer is connected to chainbase of hive node by notification ( boost signals ). The chain base notifies about starting/ending
+reindex process (replay form block.log), processes a new blocks, a new transaction and a new operations.
+### 1. Synchronization state
+The sql_serializer works in different ways when the node is reindexing blocks from block.log, from network (using P2P), and
+when is live syncing new blocks (is processing blocks that are no more than 1 minute older than the network's head block). This is important
+aspect of sql_serializer because it is strongly connected with synchronization performance. Below is a state machine diagram
+for synchronization:
+
+![](./doc/sync_state_machine.png)
+
+The current state of synchronization is controlled by the object of class [indexation_state](./include/hive/plugins/sql_serializer/indexation_state.hpp).
+
+### 2. Collect data from hive chainbase
+In each state of synchronization blocks data are cached to the [cached_data_t](./include/hive/plugins/sql_serializer/cached_data.h). 
+
+![](./doc/collecting_block_in_cache.png)
+
+At the end of ```sql_serializer_plugin_impl::on_post_apply_block``` method ```indexation_state::trigger_data_flush```
+that will dump or not the blocks to the datatabse depending on the synchronization state.
+
+### 3. Dumping cached blocks data to PostgreSQL database
+There are two class which are responisble to dump blocks data to hive_fork_manager.
+- [reindex_data_dumper](./include/hive/plugins/sql_serializer/reindex_data_dumper.h)
+  It is used to massively dump only irreversible blocks to the database directly to irrevesible tables in hive_fork_manager.
+  The dumper is optimized to dump batches of large number of blocks. Each batch is dumped using several threads with separated
+  conections to the database. The threads do not wait for each other, so during using 'reindex_dumper' FOREIGN KEY-s constraints
+  have to be disabled. Because threads do no wait for each other the contetnt of irreversible tables of hive fork manager may
+  be inconsistent, there is an rendvouz pattern used to inform the datatabse which block is already known as a head of fully dumped, consistent blocks.
+  
+  ![](./doc/reindex_dumper.png)
+- [livesync_data_dumper](./include/hive/plugins/sql_serializer/livesync_data_dumper.h)
+  The dumper is used to dump one block in each turn using `hive.push_block` hive_fork_manager function. Both reversible
+  and irreversible blocks can be dumped. The data of the block are processed by  few threads which convert them
+  to std::string-s that contains SQL presentation of irreversible tables rows. When all the threads finish processing
+  blocks data then randevouz object forms SQL syntence for calling `hive.push_block` with the prepared strings as it's parameters
+  and call the function on the database.
+  ![](./doc/livesync_dumper.png)
+
+The dumpers are triggered by the implementation of `indexation_state::flush_trigger`, that make decision if cached data can be dumped or not. There
+are 3 implemantation of triggers:
+- **reindex_flush_trigger**
+  <p>Blocks are dumped when 1000 blocks are in cache.
+- **p2p_flush_trigger**
+  <p>Blocks are dumped when at least 1000 blocks are in cache, but dumped are only those blocks which are irreversible.
+- **live_flush_trigger**
+  <p>Each block is dumped immediatly when cached.

-	./hived --replay-blockchain --stop-replay-at-block 5000000 --exit-after-replay -d ../../../datadir --force-replay --psql-index-threshold 65432
\ No newline at end of file
+On each state of indexation there is a different combination of the flush_trigger and a dumper:
+- **p2_psync** : p2p_flush_trigger + reindex_data_dumper
+- **reindex** : reindex_flush_trigger + reindex_data_dumper
+- **live** : live_flush_trigger + livesync_data_dumper
\ No newline at end of file
--- a/src/sql_serializer/doc/collecting_block_in_cache.png
+++ b/src/sql_serializer/doc/collecting_block_in_cache.png
--- a/src/sql_serializer/doc/livesync_dumper.png
+++ b/src/sql_serializer/doc/livesync_dumper.png
--- a/src/sql_serializer/doc/reindex_dumper.png
+++ b/src/sql_serializer/doc/reindex_dumper.png
--- a/src/sql_serializer/doc/sync_state_machine.png
+++ b/src/sql_serializer/doc/sync_state_machine.png