python插入数据到mysql的工具很多。每个工具写法不一样效率也不一样。就拿sqlalchemy,MySQLdb,sqlite3来说。sqlalchemy分ORM方式插入数据和非ORM方式插入。非ORM和ORM都还有细微的方式。可以参考:
http://docs.sqlalchemy.org/en/latest/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow
自己做了下尝试。插入3.8万条数据到mysql。用MySQLdb的一种写法需要1.8s左右,用sqlite3用时约2s左右,sqlalchemy的非ORM方式单纯insert需要2.8s左右,带去重则需要8s左右。说到去重,用MySQLdb的去重效率最高。各个框架工具,最开始暴露出来的称ORM,具体内部实现是怎样,需要考量。单纯的sql语句。insert into XXX values (xxx),(xxx),(xxx)。比insert into XXX values (xxx), insert into XXX values (xxx) …… 这样效率高很多。
按效率高低,贴出部分代码。
conn = MySQLdb.connect(host='localhost', port=3306, user='root', passwd='xxxxxxx', db='mydb') cur = conn.cursor() sql = "INSERT IGNORE INTO register_user(uid, gender, time, come) VALUES " aps = "" for temp in datas: aps = aps + (str(temp) + ",") aps = aps[0:-1] sql = sql + aps initTime = time.time() cur.execute(sql) cur.close() conn.commit() conn.close() print time.time() - initTime
def init_sqlite3(dbname): conn = sqlite3.connect(dbname) c = conn.cursor() c.execute("DROP TABLE IF EXISTS customer") c.execute( "CREATE TABLE customer (id INTEGER NOT NULL, " "name VARCHAR(255), PRIMARY KEY(id))") conn.commit() return conn def test_sqlite3(n=100000, dbname='sqlite3.db'): conn = init_sqlite3(dbname) c = conn.cursor() t0 = time.time() for i in xrange(n): row = ('NAME ' + str(i),) c.execute("INSERT INTO customer (name) VALUES (?)", row) conn.commit() print( "sqlite3: Total time for " + str(n) + " records " + str(time.time() - t0) + " sec")
Base = declarative_base() class Customer(Base): __tablename__ = "customer" id = Column(Integer, primary_key=True) name = Column(String(255)) def init_sqlalchemy(dbname='sqlite:///sqlalchemy.db'): global engine engine = create_engine(dbname, echo=False) DBSession.remove() DBSession.configure(bind=engine, autoflush=False, expire_on_commit=False) Base.metadata.drop_all(engine) Base.metadata.create_all(engine) def test_sqlalchemy_core(n=100000): init_sqlalchemy() t0 = time.time() engine.execute( Customer.__table__.insert(), [{"name": 'NAME ' + str(i)} for i in xrange(n)] ) print( "SQLAlchemy Core: Total time for " + str(n) + " records " + str(time.time() - t0) + " secs") #其实,就是上文连接中的方法介绍。
其他的就不罗列了。这里所要说的是批量插入。具体项目,具体看吧。