In this article we will go through how to do a RDBMS stress test using tool sysbench
. In the article we are talking about MySQL
, as sysbench also supports other DBs like PostgreSQL
, you can replace MySQL with other DBs according to your requirement.
Say we have a BE application like this:
It contains several application containers and several DB instances, each app would establish one connection to each DB instance.
Then we’d like to know under the target / high stress, what will happen to the DB instance, would they be able to handle the stress with ease or crash.
To figure out this, we can’t simply use a client process to send batch of duplicated SQL queries to DB instance. We need to mock the business like our BE app would do.
So several steps we need to do.
First we need to figure out what’s the metrics we need to monitor in the stress test.
Basically:
show status;
-> table_locks_waited
show status like 'innodb_row_lock%';
-> innodb_row_lock_waits
innodb_row_lock_time_avg
iostat -d -x -k 1
, show device data show extend data show KB value refresh every 1 second
For more metrics you may want to learn:
show status;
In this step we need to figure out the expected stress level.
3 items to be determined:
DB performance may fluctuate markedly according to the amount of records in table and amount of connections.
How (for example):
In this step we need to figure out the SQL statements which would be used in the stress test.
How:
CRUD
each type, like:
In this step we need to prepare the lua script which would be executed by the sysbench. There are multiple threads in sysbench, each thread will execute the LUA script rather than running SQL statement directly.
The script will:
Using this LUA script we can mock the SQL executions just like the CRUD ratio we got in step 4, and SQLs are also we prepared in step 4.
LUA example:
function prepare()
print("prepare of stress test")
end
function cleanup()
print("cleanup of stress test")
end
function help()
print("sysbench stress test; no special command line options available")
end
function thread_init(thread_id)
end
function thread_done(thread_id)
db_disconnect()
end
function event(thread_id)
-----------------------------------------------------------------------------------------------------------------------
-- CONST
-----------------------------------------------------------------------------------------------------------------------
local SEARCH_KEY_PH = "^SK"
local RANDOM_INT_PH = "^RI"
-----------------------------------------------------------------------------------------------------------------------
-- PARAMS
-----------------------------------------------------------------------------------------------------------------------
local totalRowsOfRecordsInDb = 10000
local cmdSeed = {
select = 3000,
insert = 150,
delete = 80,
update = 6770
}
selectQueries = {}
selectQueries["SELECT plantId, userId, seedId, status, growTime, harvestTime, blightTime, rarity, residue, leaf FROM fs2_plant_plants WHERE userId = ^SK;"] = 100
selectQueries["SELECT userId, finished FROM module_mission_finished WHERE userId = ^SK;"] = 100
selectQueries["SELECT userId, activityId, decorId, finished, acceptedRewards, expensePoint FROM fs2_activity_party WHERE userId = ^SK;"] = 100
selectQueries["SELECT userId, customerDefId, level, exp, rewards FROM fs2_customer_friendship WHERE userId = ^SK;"] = 100
selectQueries["SELECT userId, missionId, progress, acceptTime, expireTime FROM module_mission_in_progress WHERE userId = ^SK;"] = 100
selectQueries["SELECT userId, activityId, collect FROM fs2_activity_collect WHERE userId = ^SK;"] = 100
selectQueries["SELECT userId, gridId, layer1Type, layer1, layer2Type, layer2, layer3Type, layer3 FROM fs2_grounds WHERE userId = ^SK;"] = 100
selectQueries["SELECT decorId, decorDefId, userId, orient, cooldown, buildId, isUpgrade, effectCountLeft, resourceCountLeft, requestAccepted, output, withActivityOutput, freeMode FROM fs2_decoration WHERE userId = ^SK;"] = 100
selectQueries["SELECT requestId, receiverId, senderId, helpId, type, info, time, expire, platformRequestId FROM fs2_social_request WHERE receiverId = ^SK;"] = 100
selectQueries["SELECT userId, customerDefIds FROM fs2_customer_event WHERE userId = ^SK;"] = 100
insertQueries = {}
insertQueries["insert into `fv_dev`.`fs2_pet` ( `status`, `userId`, `fodderId`, `feedUpCount`, `feedUpTime`, `petDefId`) values ( '0', '89', '780003', '1', '1438223373', '900001');"] = 100
insertQueries["insert into fs2_plant_plants (userId,seedId,status,growTime,harvestTime,blightTime,rarity,residue,leaf,soilId,soilType,soilCount,harvestCount,wood,requestAccepted,withActivityOutput) values ('57332', '110001', '1', '1428746759', '1428746819', '1429005959', '0', '0', '0', '0', '0', '0', '0', '0', null, null); "]=100
updateQueries = {}
updateQueries["UPDATE module_achievement_in_progress SET progress = '[3333333333,44444,5555555555555,7777777777777777777777,^RI,111111111111,22222222222222222222,999999999999999999999999,00000000]' WHERE userId = ^SK;"] = 1000
updateQueries["UPDATE module_profile_infos SET info = ^RI, time = ^RI WHERE userId = ^SK;"] = 700
updateQueries["UPDATE fs2_plant_plants SET seedId = ^RI, status = ^RI, growTime = ^RI, harvestTime = ^RI, blightTime = ^RI, rarity = ^RI, leaf = ^RI, residue = ^RI WHERE plantId = ^SK;"] = 700
updateQueries["UPDATE module_profile SET exp = exp + 1 WHERE userId = ^SK;"] = 500
updateQueries["UPDATE module_mission_in_progress SET progress = '[3333333333,44444,5555555555555,7777777777777777777777,^RI,111111111111,22222222222222222222,999999999999999999999999,00000000]', acceptTime = ^RI WHERE userId = ^SK;"] = 300
updateQueries["UPDATE module_item SET updateTime = ^RI WHERE userId = ^SK;"] = 700
updateQueries["UPDATE fs2_plant_seeds SET exp = ^RI WHERE userId = ^SK;"] = 100
updateQueries["UPDATE fs2_social_feed SET helpId = ^RI, status = ^RI WHERE userId = ^SK;"] = 100
updateQueries["UPDATE fs2_pet SET feedUpCount = ^RI, feedUpTime = ^RI, status = ^RI WHERE petId = ^SK;"] = 100
updateQueries["UPDATE fs2_customer SET exp = ^RI, status = ^RI WHERE userId = ^SK;"] = 50
updateQueries["UPDATE fs2_customer_town SET missionId = ^RI, leaveTime = ^RI WHERE customerId = ^SK;"] = 300
updateQueries["UPDATE fs2_activity SET count = ^RI, status = ^RI WHERE userId = ^SK;"] = 100
deleteQueries = {}
deleteQueries["delete from fs2_plant_plants where userId=57332 limit 1;"] = 500
deleteQueries["delete from fs2_pet where userId=89 limit 1;"] =500
-----------------------------------------------------------------------------------------------------------------------
-- LIB
-----------------------------------------------------------------------------------------------------------------------
local function randItem(seedTable)
local result = 0
local seedSum = 0
for key, value in pairs(seedTable) do
seedSum = seedSum + value
end
local randSeed = sb_rand(1, seedSum)
local sum = 0
for cmdType, prob in pairs(seedTable) do
sum = sum + prob
if randSeed <= sum then
result = cmdType
break end
end
return result
end
local function processSleep()
os.execute("sleep 0.01") -- 10ms
end
-----------------------------------------------------------------------------------------------------------------------
-- EXEC
-----------------------------------------------------------------------------------------------------------------------
local cmdType = randItem(cmdSeed)
--print(cmdType)
--if cmdType == "insert" or cmdType == "delete" then
-- cmdType = "update"
--end
local querySeed = _G[cmdType .. "Queries"]
local sql = randItem(querySeed)
--print(sql)
if cmdType == "update" then
sql = sql:gsub("%^RI", sb_rand(1, totalRowsOfRecordsInDb))
end
sql = sql:gsub("%^SK", sb_rand(1, totalRowsOfRecordsInDb))
--processSleep() -- let the lua script sleep to make qps lower for testing, if necessary
db_query(sql)
end
You may find a sleep function processSleep
, in old sysbench version there is no way to control the TPS speed, so this function could help to slow down the execution. In latest versions, sysbench provided the option to control tps, so it’s no longer necessary.
In this step we will see how to run sysbench to execute the stress test.
$ sysbench --version
sysbench 1.0.18
Several useful options:
--threads
: How many thread running inside the sysbench, each thread means a db connection--rate
: TPS, 0 means no limit, query as fast as possible--events
: Total requests to be made in the whole stress test, use this or --time
--time
: How long should the stress test last, in seconds, use this or --events
Steps to run the stress test:
$ sysbench \
--threads=128 \
--rate=300 \
--time=3600 \
--percentile=99 \
--report-interval=5 \
--db-driver=mysql \
--mysql... \
./test_strategy.lua \
run
After the test, see the monitoring system to grab the conclusion. Or may be need to analyse the cause of the performance issues.
Use OPTIMIZE TABLE
to resort index to restore the performance. After a lot of CRUD, the InnoDB index may be messed up, and the performance reduced. You can perform this command between several rounds of tests.
You can use OPTIMIZE TABLE to reclaim the unused space and to defragment the data file. After extensive changes to a table, this statement may also improve performance of statements that use the table, sometimes significantly.
OPTIMIZE TABLE works for InnoDB, MyISAM, and ARCHIVE tables. OPTIMIZE TABLE is also supported for dynamic columns of in-memory NDB tables. It does not work for fixed-width columns of in-memory tables, nor does it work for Disk Data tables.
OPTIMIZE TABLE XXX;
You may need to test the server io before test db, since the io of some cloud servers is not the same as claimed. You can still use sysbench to do it.
Create test file:
$ sysbench \
--file-num=4 \
--file-block-size=8k \
--file-total-size=4G \
--file-test-mode=rndrd \
--file-extra-flags=direct \
--max-requests=0 \
--time=1200 \
--threads=4 \
fileio prepare
Test random read performance:
$ sysbench \
--file-num=4 \
--file-block-size=8k \
--file-total-size=4G \
--file-test-mode=rndrd \
--file-extra-flags=direct \
--max-requests=0 \
--time=30 \
--threads=4 \
--report-interval=3 \
fileio run
$ sysbench --help
Usage:
sysbench [options]... [testname] [command]
Commands implemented by most tests: prepare run cleanup help
General options:
--threads=N number of threads to use [1]
--events=N limit for total number of events [0]
--time=N limit for total execution time in seconds [10]
--forced-shutdown=STRING number of seconds to wait after the --time limit before forcing shutdown, or 'off' to disable [off]
--thread-stack-size=SIZE size of stack per thread [64K]
--rate=N average transactions rate. 0 for unlimited rate [0]
--report-interval=N periodically report intermediate statistics with a specified interval in seconds. 0 disables intermediate reports [0]
--report-checkpoints=[LIST,...] dump full statistics and reset all counters at specified points in time. The argument is a list of comma-separated values representing the amount of time in seconds elapsed from start of test when report checkpoint(s) must be performed. Report checkpoints are off by default. []
--debug[=on|off] print more debugging info [off]
--validate[=on|off] perform validation checks where possible [off]
--help[=on|off] print help and exit [off]
--version[=on|off] print version and exit [off]
--config-file=FILENAME File containing command line options
--tx-rate=N deprecated alias for --rate [0]
--max-requests=N deprecated alias for --events [0]
--max-time=N deprecated alias for --time [0]
--num-threads=N deprecated alias for --threads [1]
Pseudo-Random Numbers Generator options:
--rand-type=STRING random numbers distribution {uniform,gaussian,special,pareto} [special]
--rand-spec-iter=N number of iterations used for numbers generation [12]
--rand-spec-pct=N percentage of values to be treated as 'special' (for special distribution) [1]
--rand-spec-res=N percentage of 'special' values to use (for special distribution) [75]
--rand-seed=N seed for random number generator. When 0, the current time is used as a RNG seed. [0]
--rand-pareto-h=N parameter h for pareto distribution [0.2]
Log options:
--verbosity=N verbosity level {5 - debug, 0 - only critical messages} [3]
--percentile=N percentile to calculate in latency statistics (1-100). Use the special value of 0 to disable percentile calculations [95]
--histogram[=on|off] print latency histogram in report [off]
General database options:
--db-driver=STRING specifies database driver to use ('help' to get list of available drivers) [mysql]
--db-ps-mode=STRING prepared statements usage mode {auto, disable} [auto]
--db-debug[=on|off] print database-specific debug information [off]
Compiled-in database drivers:
mysql - MySQL driver
mysql options:
--mysql-host=[LIST,...] MySQL server host [localhost]
--mysql-port=[LIST,...] MySQL server port [3306]
--mysql-socket=[LIST,...] MySQL socket
--mysql-user=STRING MySQL user [sbtest]
--mysql-password=STRING MySQL password []
--mysql-db=STRING MySQL database name [sbtest]
--mysql-ssl[=on|off] use SSL connections, if available in the client library [off]
--mysql-ssl-cipher=STRING use specific cipher for SSL connections []
--mysql-compression[=on|off] use compression, if available in the client library [off]
--mysql-debug[=on|off] trace all client library calls [off]
--mysql-ignore-errors=[LIST,...] list of errors to ignore, or "all" [1213,1020,1205]
--mysql-dry-run[=on|off] Dry run, pretend that all MySQL client API calls are successful without executing them [off]
Compiled-in tests:
fileio - File I/O test
cpu - CPU performance test
memory - Memory functions speed test
threads - Threads subsystem performance test
mutex - Mutex performance test
See 'sysbench <testname> help' for a list of options for each test.
$ sysbench fileio help
sysbench 1.0.18 (using bundled LuaJIT 2.1.0-beta2)
fileio options:
--file-num=N number of files to create [128]
--file-block-size=N block size to use in all IO operations [16384]
--file-total-size=SIZE total size of files to create [2G]
--file-test-mode=STRING test mode {seqwr, seqrewr, seqrd, rndrd, rndwr, rndrw}
--file-io-mode=STRING file operations mode {sync,async,mmap} [sync]
--file-extra-flags=[LIST,...] list of additional flags to use to open files {sync,dsync,direct} []
--file-fsync-freq=N do fsync() after this number of requests (0 - don't use fsync()) [100]
--file-fsync-all[=on|off] do fsync() after each write operation [off]
--file-fsync-end[=on|off] do fsync() at the end of test [on]
--file-fsync-mode=STRING which method to use for synchronization {fsync, fdatasync} [fsync]
--file-merged-requests=N merge at most this number of IO requests if possible (0 - don't merge) [0]
--file-rw-ratio=N reads/writes ratio for combined test [1.5]
EOF