- 
                Notifications
    You must be signed in to change notification settings 
- Fork 619
Postgres Modifications
Originally, Postgres uses a multiprocessing architecture. The main process is called Postmaster, which is responsible for dealing with requests to the Postgres, as well as a number of system-wide operations, such as startup, shutdown. Note that Postmaster itself does not perform these operations, instead it forks off subprocess to perform operations. Not exceptionally, backends that handle user query are also forked off by the Postmaster. This architecture works pretty good for a disk-based database, as disk can serve as a big shared storage. Since peloton is a main-memory database, a multi-processing architecture make it extremely difficult to share information among different backends and peloton database. As an early attempt, we tried to use shared memory to let peloton get query plans and other information from each forked backed. However, the performance is unacceptably slow. Therefore, we decided to make Postgres multithreaded!
To make Postgres multithreaded is not an easy task. In PostgreSQL's implementation, global variables are very common and shared memory is used. New forked processes would inherit values of those initialized states from Postmaster. After forking, every backends are isolated processes and thus do not need to worry about synchronizing those global states. However, it will be totally different story, if all backends are threads. All New spawned threads will share those global variables, which would mess up everything. One simple way to tackle this is to make those global variables thread local at the cost of some overhead to managing thread-local storage. If you are interested in the details of TLS, please refer to [Thread-local storage] (https://en.wikipedia.org/wiki/Thread-local_storage).
New problems arise after we started to make global variables thread local. First, it is not easy to know exactly whether each given global variable actually needs to be isolated among threads. Certain individual variable may just a state that is initialized once and read-only afterwards. These kind of variables' values have to be inherited from Postmaster. Second, certain variables may not need to be initialized but are run-time states that require per-backend isolation.
It is possible that at the time when a new backend thread needs to be spawned, we take a snapshot of all global thread-local variables and pass these to the new spawned thread so that it is able to initialize all these states from scratch before starting to process queries. This approach is impractical as gathering all states would break the modularity of Postgres and more importantly saving and restoring all the global states is expensive. Fortunately, we found that there is a mode in Postmaster called EXEC_BACKEND, which is used under Windows. An interesting fact about Windows is that it lacks of a semantically equivalent of fork() system call. The closest equivalent is a fork-and-exec, which means that no state variables will be inherited from the Postmaster by its child processes, and any state that needs to
be carried across has to be handled explicitly. This looks pretty similar to our situation. In fact, EXEC_BACKEND provides utility routines save_backend_variable() and restore_backend_variable() that clearly provide a list of variables that have to be handled explicitly in order for backend to function properly.
Known issues:
- signal handling are not completely ported to multithreading.
The following files should be automatically generated:
- /src/postgres/backend/bootstrap/bootparse.c
- /src/postgres/backend/bootstrap/bootscanner.c
- /src/postgres/backend/parser/scan.c
- /src/postgres/backend/parser/gram.c
- /src/postgres/backend/replication/repl_gram.c
- /src/postgres/backend/replication/repl_scanner.c
- /src/postgres/backend/utils/fmgrtab.c
- /src/postgres/backend/utils/misc/guc-file.c
- /src/postgres/backend/utils/sort/qsort_tuple.c
- /tools/pg_psql/sql_help.c
- /tools/pg_psql/psql_scan.c
In order to port Postgres to C++, we made the following changes:
- Avoid keyword conflict
All variables that have conflicts with C++ keyword are appended with "___". Details of the cases are as follows:
- new
- namespace
- friend
- public
- private
- typename
- typeid
- constexpr
- operator
- class
- 
Make use of C++ inheritance to avoid casting All derived nodes struct in parsenodes.hare redefined using C++ inheritance.
- 
Resolve error for missing operator= Define operator=manually for the cases where volatile qualifier is used. C++ does not generate assignment operator for such cases by default. Deails of the cases are as follows:- 
RelFileNodeatinclude/storage/relfilnode.h
- 
QueuePositionatbackend/commands/async.cpp
- 
BufferTagatinclude/storage/buf_internals.h
 
- 
- 
Resolve error for implicitly deleted default constructor union's default constructor is implictly deleted if one of its member has non-trivial constructor. The work around is to define the constructor mannually. Details of the cases are as follows:- 
SharedInvalidationMessagearinclude/storage/sinval.h
 
- 
- 
Resolve error for missing operator++The work around is to use operator+, instead ofoperator++. We changed all the occurrances offorkNum++toforkNum = forkNum + 1
- 
Resolve error for missing namespace for inner enum Member enums have to be resolved by specifying class name. Details of the cases are as follows: - JsonbValue
 
- 
Avoid redefinition for static array Forward declaration for static array would be recognized as redefinition in C++. The work around is to add an anonymous namespace for them. The details of the the cases are as follows: - 
pg_crc32c_tableatport/pg_crc32c_sb8.cpp
 
- 
- 
Resolve unreference problem for extern const variable The work around is to add externat the place where the variable is defined Details of the cases are as follows:- 
sync_method_optionsatbackend/access/transam/xlog.cpp
- 
wal_level_optionsatbackend/access/rmgrdesc/xlogdesc.cpp
- 
dynamic_shared_memory_optionsatbackend/access/transam/xlog.cpp
- 
archive_mode_optionsatbackend/access/transam/xlog.cpp
 
- 
- 
Resolve the differece of function pointer in C and C++ In C, it is possible to declare a function that takes arbitray number of argument. But it is not the case in C++. The work around is to explicitly define funciton pointer types for different number of arguments. The datails of the cases are as follows: - 
func_ptr0atbackend/utils/fmgr/fmgr.c
- 
func_ptr1atbackend/utils/fmgr/fmgr.c
- 
func_ptr2atbackend/utils/fmgr/fmgr.c
- 
func_ptr3atbackend/utils/fmgr/fmgr.c
- 
func_ptr4atbackend/utils/fmgr/fmgr.c
- 
func_ptr5atbackend/utils/fmgr/fmgr.c
- 
func_ptr6atbackend/utils/fmgr/fmgr.c
- 
func_ptr7atbackend/utils/fmgr/fmgr.c
- 
func_ptr8atbackend/utils/fmgr/fmgr.c
- 
func_ptr9atbackend/utils/fmgr/fmgr.c
- 
func_ptr10atbackend/utils/fmgr/fmgr.c
- 
func_ptr11atbackend/utils/fmgr/fmgr.c
- 
func_ptr12atbackend/utils/fmgr/fmgr.c
- 
func_ptr13atbackend/utils/fmgr/fmgr.c
- 
func_ptr14atbackend/utils/fmgr/fmgr.c
- 
func_ptr15atbackend/utils/fmgr/fmgr.c
- 
func_ptr16atbackend/utils/fmgr/fmgr.c
- 
expression_tree_walkeratinclude/nodes/nodeFunc.h
- 
expression_tree_mutatoratinclude/nodes/nodeFunc.h
- 
query_tree_walkeratinclude/nodes/nodeFunc.h
- 
query_tree_mutatoratinclude/nodes/nodeFunc.h
- 
range_table_walkeratinclude/nodes/nodeFunc.h
- 
range_table_mutatoratinclude/nodes/nodeFunc.h
- 
query_or_expression_tree_walkeratinclude/nodes/nodeFunc.h
- 
query_or_expression_tree_mutatoratinclude/nodes/nodeFunc.h
- 
raw_expression_tree_walkeratinclude/nodes/nodeFunc.h
 
- 
- 
Changed C-style typecasts to static_cast in multiple files at: - /src/postgres/interfaces/libpq
- /src/postgres/backend/access/brin
- /src/postgres/backend/access/gin
- /src/postgres/backend/access/gist
 
- 
Changed multiple C-style casts to reinterpret_cast, especially in: - /src/postgres/backend/access/fe-lobj.cpp
 
- 
Suppressed warnings for String to char * conversion (which has been deprecated) by adding the CXXFLAG Wno-write-string 
- 
Added a macro at the end of c.h file to suppress compiler warning for unused variables in various function calls. Used this UNUSED macro wherever variables passed to a function were either unused or used in #ifdef ... #endif