Skip to content
Snippets Groups Projects
Commit be0c377d authored by Marcin's avatar Marcin
Browse files

documentation update

- one common installtion itruction for whole HAF
- sql_serializer parameter description
- cmake targets types
- sql_serialzier: dumping blocks to the datatabase
parent 5c24f2af
No related branches found
No related tags found
7 merge requests!627merge in fix for get_current_block_age,!626Fix get_current_block_age function to avoid healthcheck fails,!622merge develop to master,!599merge ( with merge commit) develop to master,!597Merge develop to master for release,!209Draft another test for changes delivery to master,!57documentation update
Pipeline #35850 passed
......@@ -2,24 +2,43 @@
Contains the implementation of Hive Application Framework which encompass hive node plugin and Postgres specific tools providing
functionalities required by other projects storing blockchain data in the Postrges database.
# Compilation
## Requirements
The HAF works between the HIVE network and the applications
![alt text](./doc/c2_haf.png)
The HAF contains a few components visible at the picture above:
* **HIVED - hive node**
Regular HIVE node which syncs blocks with the HIVE network or replays them from block.log file.
* **SQL_SERIALIZER**
A hived's plugin (the hive node plugin) which during syncing a new block pushes its data to SQL database. Moreover, the plugin informs the database about the occurrence of micro-fork and changing a block status from reversible to irreversible.
A detailed documentation for sql_serializer is here: [src/sql_serializer/README.md](./src/sql_serializer/README.md)
* **PostgreSQL database**
The database contains the blockchain blocks data in form of filled SQL tables, and the applications tables. The system utilizes Postgres authentication and authorization mechanisms.
* **HIVE FORK MANAGER**
The PostgreSQL extension provides the HAF's API - a set of SQL functions that are used by the application to get blocks data. The extension controls the process by which applications consume blocks and ensures that applications cannot corrupt each other. The HIVE FORK MANAGER is responsible for rewind the applications tables changes in case of micro-fork occurrence. The extension defines the format of blocks data saved in the database. The SQL_SERIALIZER dumps blocks to the tables defined by HIVE FORK MANAGER.
A detailed documentation for hive_fork_manager is here: [src/hive_fork_manager/Readme.md](./src/hive_fork_manager/Readme.md)
# Requirements
## Environment
1. Tested on Ubuntu 20.04
2. postgresql server dev package: `sudo apt-get install postgresql-dev-12`
3. ssl dev package: `sudo apt-get install libssl-dev`
4. readline dev package: `sudo apt-get install libreadline-dev`
5. pqxx dev package: `sudo apt-get install libpqxx-dev`
## CMake and make
This will build all the targets from the HAF repository and `hived` program from submodule `hive`. You can pass
## PostgreSQL cluster
The project is intended to run on postgres version 12 or higher, however it is possible to use it
on older verisions but without any guaranteens.
# Build
CMake and make is used to build the project. Procedure presented below will build all the targets from the HAF repository and `hived` program from submodule `hive`. You can pass
the same CMake parameters which are used to compile hived project ( for example: -DCLEAR_VOTES=ON -DBUILD_HIVE_TESTNET=OFF -DHIVE_LINT=OFF).
2. `git submodule update --init --recursive`
3. create build directory, for exemple in sources root: `mkdir build`
4. `cd build`
5. `cmake -DCMAKE_BUILD_TYPE=Release ..`
6. `make`
1. `git submodule update --init --recursive`
2. create build directory, for example in sources root: `mkdir build`
3. `cd build`
4. `cmake -DCMAKE_BUILD_TYPE=Release ..`
5. `make`
### Choose version of the Postgres to compile with
CMake variable `POSTGRES_INSTALLATION_DIR` is used to point the installation folder
......@@ -30,24 +49,70 @@ is installed on Ubuntu. An example of choosing different version of Postgres:
3. `cmake -DPOSTGRES_INSTALLATION_DIR=/usr/lib/postgresql/10/bin ..`
4. `make`
# Tests
The project uses ctest to start tests, just execute in build directory `make test`
Test are grouped in a tree by names and `.` as a branch separator where 'test' is the root.
For example You can start all unit tests with command `ctest -R test.functional.*`
# Installation
Postgres plugins has to be copied into postgres `$libdir/plugins directory`
You can check postgres `$libdir` directory with: `pg_config --pkglibdir`
## 1. Configure PostgreSQL cluster
Compiled PostgreSQL plugins and extensions have to be installed in the cluster. The best method
to do this is execute in the build directory (may require root privilieges):
- `make install`
This will copy plugins to the Postgres cluster `$libdir/plugins` directory and exstensions to
`<postgres_shared_dir>/extension`.
You can check the `$libdir` with command: `pg_config --pkglibdir`, and the shared dir with `pg_config --sharedir`
### - Authorization
It is required to configure two based roles:
```
CREATE ROLE hived_group WITH NOLOGIN;
CREATE ROLE hive_applications_group WITH NOLOGIN;
```
The HAF will grant to them access to its internal elements in a way which gurantees security for the application data
and applications execution flows.
The maintainer of the PostgreSQL cluster server needs to create roles ( users ) which inherits from one of these groups for example:
```
CREATE ROLE hived LOGIN PASSWORD 'hivedpass' INHERIT IN ROLE hived_group;
CREATE ROLE application LOGIN PASSWORD 'applicationpass' INHERIT IN ROLE hive_applications_group;
```
The roles which inherits from `hived_groups` must be used by `sql_serializer` process to login into the database
, roles which inherit from `hive_application_group` shall be used by the applications.
Each application role does not have access to internal data created by other application roles and cannot
modify data modified by the 'hived'. 'Hived' roles cannot modify the applications data.
More about roles in PostgreSQL documentaion: [CREATE ROLE](https://www.postgresql.org/docs/10/sql-createrole.html)
Note: whenever you build a new version of the extensions, you have to create a new database.
There is no way currently to upgrade the schema installed in your old HAF database.
The best option is to execute `make install` from build directory (may required to have root privileges)
## 2. Preparing a PostgreSQL database
The newly create database have to have created hive_fork_manager extension. Without this 'sql_serializer'
won't connect the hived node with database. To start using the extension in a database, execute psql
command: `CREATE EXTENSION hive_fork_manager CASCADE;`. The CASCADE phrase is needed, to automatically install extensions the hive_fork_manager depends on.
Note: whenever you build a new version of the hive_fork_marnager extension, you have to create a new database.
There is no way currently to upgrade the schema installed in your old HAF database.
The database should use parameters
ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' and LC_CTYPE = 'en_US.UTF-8' (this is default for american english locale,
it's not tested on other locale configurations).
# Architecture
## Directory structure
# Tests
## 1. Integrations ```tests/integrations```
Integrations tests are tests that are running on a module of the project or on a system of the projects modules.
The tests do not use mock-ups to run modules/system under tests in isolation from their environment, instead
they are **integrated** with the environment, call real OS API functions, cooperate with real working servers, clients applications or databases.
### a) Functional tests ```tests/integrations/functional```
Functional tests are concentrated on tests functions of one module, they test its interface. The tests call
the functions and check results.
The project uses ctest to start functional tests. Tests are grouped in a tree by names and `.` as a branch separator where 'test' is the root.
For example You can start all the functional tests with command `ctest -R test.functional.*`
### b) Replay tests ```tests/integrations/replay```
The test validates if a module or a system under test works correctly during and after replaying the blockchain from block_log file.
The tests are written with python and pytest is used as the test framework.
### c) System tests ```tests/integrations/system```
The tests check interactions between the projects modules.
The tests are written with python and pytest is used as the test framework.
## 2. Unit ```tests/unit```
Unit tests are used to test parts of modules in isolation from the environment, it means **all** the functions
called by the unit under test, which are not part of the unit, are mocked and their results are fully controlled by the test framework.
# Directory structure
```
cmake Contains common functions used by cmake build
common_includes
......@@ -55,12 +120,15 @@ There is no way currently to upgrade the schema installed in your old HAF databa
doc Contains documentation
hive Submodule of hive project: https://gitlab.syncad.com/hive/hive
src Contains sources
applications Contains utilities which help to develop HIVE applications based on HAF
hive_fork_manager Contains SQL extension which implements solution for hive forks
sql_serializer C++ hived plugin which is compiled tohether with hived
transaction_controllers library with C++ utilities to controll Postgres transactions
hive_fork_manager Contains SQL extension which implements solution for hive forks
tests Contains test
integration Folder for non-unit tests like functional or system tests
functional Contains functional tests
replay Tests which checks replayin HAF from block_log file
system Tests which chechc interactions between hived internals, sql_serializer, hive_fork_mnager and an application
unit Contains unit tests and mocks
mockups Contains mocks
```
......@@ -68,10 +136,48 @@ There is no way currently to upgrade the schema installed in your old HAF databa
There is also a `generated` directory inside the build directory. It contains autmatically generated headers which can be included
in the code whith ```#include "gen/header_file_name.hpp"```
## PSQL extension based on sql script
If there is a need to create psql extension ( to use CREATE EXTENSION psql command ) a cmake macro is added:
# Predefined cmake targetes
To simplify adding new modules to the project the build system introduces macros which defines few types of project items.
## 1. Static C++ library
To setup compiler and linker setting to generate static library use macro:
`ADD_STATIC_LIB` with parameter
- target_name - name of the static lib target
The macro adds all *.cpp files from the directory in which the `CMakeLists.txt` file is placed ( `${CMAKE_CURRENT_SOURCE_DIR}` )
## 2. Run-time loaded C++ library
To setup compiler and linker setting to generate dynamicaly loaded library which will be opened
during program run-time with dlopen please use macro:
`ADD_RUNTIME_LOADED_LIB` with parameter
- target_name - name of the library target
The macro adds to compilation all *.cpp files from the directory in which the `CMakeLists.txt` file is placed ( `${CMAKE_CURRENT_SOURCE_DIR}` )
## 3. Load-time loaded C++ library
To setup compiler and linker setting to generate dynamicaly loaded library which will be loaded
by the loader during startin a program please use macro:
`ADD_LOADTIME_LOADED_LIB` with parameter
- target_name - name of the library target
The macro adds to compilation all *.cpp files from the directory in which the `CMakeLists.txt` file is placed ( `${CMAKE_CURRENT_SOURCE_DIR}` )
## 4. GTest unit test target
To add unit test based on gtest and gmoc frameworks pleas use a macro
`ADD_UNIT_TESTS` wit parameter
- module_name - name of test module
The macro adds to compilation all *.cpp files from the directory in which the `CMakeLists.txt` file is placed ( `${CMAKE_CURRENT_SOURCE_DIR}` ).
The test `test.unit.<module_name>` is added to ctest.
## 5. PSQL extension based on sql script
If there is a need to create psql extension ( to use CREATE EXTENSION psql command ) a cmake macro is added to cmake:
`ADD_PSQL_EXTENSION` with parameters:
- NAME - name of extension, in current source directory file <name>.control (see https://www.postgresql.org/docs/10/extend-extensions.html#id-1.8.3.18.11 )
- NAME - name of extension, in current source directory file <name>.control (see https://www.postgresql.org/docs/12/extend-extensions.html#id-1.8.3.18.11 )
- SOURCES - list of sql scripts, the order of the files is important since they are compiled into one sql script
The macro creates a new target extension.<name_of_extension>. The command 'make extension.<name_of_extension>' will create
......@@ -85,4 +191,5 @@ in separated build directory with making the only one target, for example: `make
Postgres extensions are versioned - extension control file contains `default_version` configuration entry. The build system
fills the entry with the repository git sha.
Also corresponding sql script file is named with the same version, as is required by the postgres.
# Known problems
doc/c2_haf.png

217 KiB

# HIVE_FORK_MANAGER
The fork manager is composed of SQL scripts to create a Postgres extension that provides an API that simplifies reverting application data when a fork switch occurs on the Hive blockchain.
## Requirements for postgres
Extension is intended to run on postgres version 12 or higher, database used with extension should use parameters
ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' and LC_CTYPE = 'en_US.UTF-8' (this is default for american english locale,
it's not tested on other locale configurations).
## Installation
It is possible to install the fork manager in two forms - as a regular Postgres extension or as a simple set of tables and functions.
......@@ -15,29 +10,12 @@ It is possible to install the fork manager in two forms - as a regular Postgres
3. `make extension.hive_fork_manager`
4. `make install`
The extension will be installed in the directory `<postgres_shareddir>/extension`. You can check the directory with `pg_config --sharedir`.
To start using the extension in a database, execute psql command: `CREATE EXTENSION hive_fork_manager CASCADE;`. The CASCADE phrase is needed, to automatically install extensions the hive_fork_manager depends on.
### Alternatively, you can manually execute the SQL scripts to directly install the fork manager
The required ordering of the sql scripts is included in the cmake file [src/hive_fork_manager/CMakeLists.txt](./CMakeLists.txt).
Execute each script one-by-one with `psql` as in this example: `psql -d my_db_name -a -f context_rewind/data_schema.sql`
### Authorization
During its creation the extension introduces two new roles (groups): `hived_group` and `hive_applications_group`. The maintainer of
the PostgreSQL cluster server needs to create roles ( users ) which inherits from one of these groups.
```
CREATE ROLE hived LOGIN PASSWORD 'hivedpass' INHERIT IN ROLE hived_group;
CREATE ROLE application LOGIN PASSWORD 'applicationpass' INHERIT IN ROLE hive_applications_group;
```
The roles which inherits
from `hived_groups` must be used by `hived` process to login into the database, roles which inherit from `hive_application_group` shall
be used by the applications. Each application role does not have access to internal data created by other application roles and cannot
modify data modified by the 'hived'. 'Hived' roles cannot modify the applications data.
More about roles in PostgreSQL documentaion: [CREATE ROLE](https://www.postgresql.org/docs/10/sql-createrole.html)
## Architecture
All elements of the fork manager are placed in a schema called 'hive'.
......
# SQL serializer plugin
# SQL_SERIALIZER
It is a hived plugin which is resposible to dump blocks data to hive_fork_manager
and informs it about important events occurence in the node for example a micro-fork occurence.
## Before cmake
## Build
As other hived plugins also sql_serializer is compiled during compiling the hived program.
There is a trick which allows for this: cmake scripts creates a symbolic
link to sql_serializer sources in `hive/libraries/plugins`, then cmake script of hive
submodule find the plugins sources together with sql_serializer - it follows the symbolic link.
sudo apt-get install libpqxx-dev -y
## Basic setup
Add this to your `config.ini`, with proper data
## Setup
You need to add plugin to the hived node config.ini file:
```
plugin = sql_serializer
......@@ -19,6 +22,90 @@ psql-enable-accounts-dump = true
psql-force-open-inconsistent = false
```
### Example command
## Parameters
The sql_serializer extend hived about new parameters:
* **psql-url** contains line of parameters which are used to connect to the database
- *dbname* name of the database on PostgreSQL cluster
- *user* a Postgres role name used to connect to the database
- *hostaddr* an internet address of the PostgreSQL cluster
- *port* a TCP port on wich the PostgreSQL cluster is listening
```
Example:
psql-url = dbname=block_log user=postgres password=pass hostaddr=127.0.0.1 port=5432
```
* **psql-index-threshold** [default: 1'000'000] an integer number that represents the limit of blocks which allows continuing synchronizing blocks after the node restart without disabling SQL indexes. During massive synchronization
(for example, using block_log) inserting a large number of blocks of data will be drastically slowed down by the indexes, so it
is good to remove them, but removed indexes have to be recreated before the live sync start (otherwise HAF will be too slow for applications).
Recreating indexes lasts a lot of time (depending on the number of blocks in the database) and there is a need to find a trade-off: is
better to sync slowly not a large number of blocks with indexes and avoid delay for re-creating them, or is it better
to faster synchronize a large number of blocks and deal with the delay. psql-index-threshold is the limit of a number of blocks that will
be synchronized slowly with enabled indexes.
* **psql-operations-threads-number**[default: 5] a number of threads which are used to dump blockchain operations to the database. Operations
are the biggest part of the block's data. because there is a large number of operations to sync. The operations are goruped
to packages which are dumped conurently to the database.
* **psql-transactions-threads-number**[default: 2] a number of threads used to dump transaction to the database
* **psql-account-operations-threads-number**[default: 2] a number of threads used to dump account operations to the database
* **psql-enable-accounts-dump**[default: true] a boolen value, if true account and account operations will be dumped during the blocks synchronization
* **psql-force-open-inconsistent**[default: false] a boolean value, if true the plugin will connect to the database even it is in inconsitent state.
During syncing blocks it may happen that the hived will crash, and the irreversible part of blocks data in the database may stay inconsistent because
threads which dumped blocks were brutally broken. The HAF database contains information about inconsistency of the data, and during restart of the
hived sql_serialzier will fail. The Hive Fork Manager has functionallity that repairs the datatbase, but it may last very long time and thus it is required
to explictly force open the database and start the rescue action by using switch: `-psql-force-open-inconsistent=true`
### Example hived command
./hived --replay-blockchain --stop-replay-at-block 5000000 --exit-after-replay -d ../../../datadir --force-replay --psql-index-threshold 65432
## Synchronization blocks process
The sql_serializer is connected to chainbase of hive node by notification ( boost signals ). The chain base notifies about starting/ending
reindex process (replay form block.log), processes a new blocks, a new transaction and a new operations.
### 1. Synchronization state
The sql_serializer works in different ways when the node is reindexing blocks from block.log, from network (using P2P), and
when is live syncing new blocks (is processing blocks that are no more than 1 minute older than the network's head block). This is important
aspect of sql_serializer because it is strongly connected with synchronization performance. Below is a state machine diagram
for synchronization:
![](./doc/sync_state_machine.png)
The current state of synchronization is controlled by the object of class [indexation_state](./include/hive/plugins/sql_serializer/indexation_state.hpp).
### 2. Collect data from hive chainbase
In each state of synchronization blocks data are cached to the [cached_data_t](./include/hive/plugins/sql_serializer/cached_data.h).
![](./doc/collecting_block_in_cache.png)
At the end of ```sql_serializer_plugin_impl::on_post_apply_block``` method ```indexation_state::trigger_data_flush```
that will dump or not the blocks to the datatabse depending on the synchronization state.
### 3. Dumping cached blocks data to PostgreSQL database
There are two class which are responisble to dump blocks data to hive_fork_manager.
- [reindex_data_dumper](./include/hive/plugins/sql_serializer/reindex_data_dumper.h)
It is used to massively dump only irreversible blocks to the database directly to irrevesible tables in hive_fork_manager.
The dumper is optimized to dump batches of large number of blocks. Each batch is dumped using several threads with separated
conections to the database. The threads do not wait for each other, so during using 'reindex_dumper' FOREIGN KEY-s constraints
have to be disabled. Because threads do no wait for each other the contetnt of irreversible tables of hive fork manager may
be inconsistent, there is an rendvouz pattern used to inform the datatabse which block is already known as a head of fully dumped, consistent blocks.
![](./doc/reindex_dumper.png)
- [livesync_data_dumper](./include/hive/plugins/sql_serializer/livesync_data_dumper.h)
The dumper is used to dump one block in each turn using `hive.push_block` hive_fork_manager function. Both reversible
and irreversible blocks can be dumped. The data of the block are processed by few threads which convert them
to std::string-s that contains SQL presentation of irreversible tables rows. When all the threads finish processing
blocks data then randevouz object forms SQL syntence for calling `hive.push_block` with the prepared strings as it's parameters
and call the function on the database.
![](./doc/livesync_dumper.png)
The dumpers are triggered by the implementation of `indexation_state::flush_trigger`, that make decision if cached data can be dumped or not. There
are 3 implemantation of triggers:
- **reindex_flush_trigger**
<p>Blocks are dumped when 1000 blocks are in cache.
- **p2p_flush_trigger**
<p>Blocks are dumped when at least 1000 blocks are in cache, but dumped are only those blocks which are irreversible.
- **live_flush_trigger**
<p>Each block is dumped immediatly when cached.
./hived --replay-blockchain --stop-replay-at-block 5000000 --exit-after-replay -d ../../../datadir --force-replay --psql-index-threshold 65432
\ No newline at end of file
On each state of indexation there is a different combination of the flush_trigger and a dumper:
- **p2_psync** : p2p_flush_trigger + reindex_data_dumper
- **reindex** : reindex_flush_trigger + reindex_data_dumper
- **live** : live_flush_trigger + livesync_data_dumper
\ No newline at end of file
src/sql_serializer/doc/collecting_block_in_cache.png

47.6 KiB

src/sql_serializer/doc/livesync_dumper.png

116 KiB

src/sql_serializer/doc/reindex_dumper.png

151 KiB

src/sql_serializer/doc/sync_state_machine.png

23.2 KiB

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment