- 
                Notifications
    You must be signed in to change notification settings 
- Fork 435
Closed
Description
- asyncpg version: 0.17.0
- PostgreSQL version: 10
- Do you use a PostgreSQL SaaS?  If so, which?  Can you reproduce
 the issue with a local PostgreSQL install?: local
- Python version: 3.7
- Platform: Windows
- Do you use pgbouncer?: No
- Did you install asyncpg with pip?: Yes
- If you built asyncpg locally, which version of Cython did you use?: No
- Can the issue be reproduced under both asyncio and
 uvloop?: did not use UVLoop
I am building a DB for storing financial tick data, so was testing insert performance for different DBs using different libs. For postgres I used psycopg2 in sync mode, aiopg, and asyncpg. I was inserting 120000 rows of symbol and OHLC data.
What I was getting very bad insert performances using asyncpg insertmany
For creating sample data
import asyncio
import string
import random
from time import time
import numpy as np
import pandas as pd
import asyncpg
import psycopg2
import aiopg
# Number of securirities to insert
SECURITIES = 2000
def string_gen(size=6, chars=string.ascii_uppercase + string.digits):
    return ''.join(random.choice(chars) for _ in range(size))
def generate_random_data():
    """Generate random data for inserting to DB"""
    index = pd.date_range(start='2018-01-01 00:00:01', end='2018-01-01 00:01:00', freq='s')
    
    dflist = []
    for _ in range(SECURITIES):
        data = np.random.rand(len(index), 4)
        data = pd.DataFrame(data, index=index, columns=['open', 'high', 'low', 'close'])
        data['symbol'] = string_gen()
        dflist.append(data)
    data = pd.concat(dflist)
    data.index.name = 'time'
    data = data.reset_index()
    data = data[['time', 'symbol', 'open', 'high', 'low', 'close']]
    return [tuple(x) for x in data.values]
For psycopg2 insert took 5 to 6 seconds
args_str = b','.join(cur.mogrify(string, row) for row in data)
args_str = args_str.decode('utf-8') # Convert byte string to UTF-8
cur.execute("INSERT INTO ohlc (time, symbol, open, high, low, close) VALUES " + args_str)
conn.commit()
For asyncpg insertmany took 30 seconds so i did something like by creating a insert statement using psycopg2 and insert using asyncpg.. still took 7 to 8 seconds
p_conn = psycopg2.connect(user='postgres', password='postgres')
cur = p_conn.cursor()
string = '(' + ('%s,' * len(data[0]))[:-1] + ')'
args_str = b','.join(cur.mogrify(string, row) for row in data)
cur.close()
args_str = args_str.decode('utf-8')
insert_str = "INSERT INTO ohlc (time, symbol, open, high, low, close) VALUES " + args_str
await conn.execute(insert_str)
similarly for aiopg i was getting 6 to 7 seconds..
I guess i am doing something wrong.. since the DB is same for all 3 libs.. we can ignore performance issues of DB.
Metadata
Metadata
Assignees
Labels
No labels