=== TEST 1: dying in init_by_lua_block
--- http_config
init_by_lua_block {
error("I am dying!")
}
--- config
--- must_die
--- error_log
I am dying!
Testing Erroneous Cases
Most robust software invests heavily on error handling, and naturally test designers focus on corner cases and erroneous scenarios to maximize code coverage of the tests.
The previous section introduces data sections provided by Test::Nginx::Socket
for examining messages in the NGINX error log file, which is a powerful
tool to check for errors in the tests. Sometimes we want to test more extreme
cases like server startup failures, malformed responses, bad requests,
and various kinds of timeout errors.
Expected Server Startup Failures
Sometimes the NGINX server is expected to fail to start, like using an
NGINX configuration directive in the wrong way or some hard prerequisites
are not met in early initialization. If we want to test such cases, especially
the error log messages generated for such failures, we could use the must_die
data section in our test block to signal the test scaffold that the NGINX
server is expected to die upon startup in this very block.
The following example tests the case of throwing a Lua exception in the
context of init_by_lua_block
of the ngx_http_lua
module.
The Lua code in init_by_lua_block
runs in the NGINX master process during
the NGINX configuration file loading process. Throwing out a Lua exception
there aborts the NGINX startup process immediately. The occurrence of the
must_die
section tells the test scaffold to treat NGINX server startup
failures as a test pass while a successful startup as a test failure. The
error_log
section there ensures that the server fails in the expected
way, that is, due to the "I am dying!" exception.
If we remove the --- must_die
line from the test block above, then the
test file won’t even run to completion:
t/a.t .. nginx: [error] init_by_lua error: init_by_lua:2: I am dying! stack traceback: [C]: in function 'error' init_by_lua:2: in main chunk Bailout called. Further testing stopped: TEST 1: dying in init_by_lua_block - Cannot start nginx using command "nginx -p .../t/servroot/ -c .../t/servroot/conf/nginx.conf > /dev/null".
By default the test scaffold treats NGINX server startup failures as fatal
errors in running the tests. The must_die
section, however, turns such
a failure into a normal test checkup.
Expected Malformed Responses
HTTP responses should always be well-formed, but unfortunately the real world is complicated and there indeed exists cases where the responses can be malformed, like being truncated due to some unexpected causes. As a test designer, we always want to test such strange abnormal cases, among other things.
Naturally, Test::Nginx::Socket
treats malformed responses from the NGINX
server as an error since it always does sanity checks on the responses
it receives from the test server by default. But for test cases where we
expect a malformed or truncated response sent from the server, we should
explicitly tell the test scaffold to disable the response sanity check
via the ignore_response
data section.
Consider the following example that closes the downstream connection immediately after sending out the first part of the response body.
=== TEST 1: aborting response body stream
--- config
location = /t {
content_by_lua_block {
ngx.print("hello")
ngx.flush(true)
ngx.exit(444)
}
}
--- request
GET /t
--- ignore_response
--- no_error_log
[error]
The ngx.flush(true)
call in the content_by_lua_block
handler is to
ensure that any response body data buffered by NGINX is indeed flushed
out to the system socket send buffers, which also usually means flushing
the output data to the client side for local sockets. Also, the ngx.exit(444)
call is used to immediately close the current downstream connection so
it just interrupts the response body stream in the HTTP 1.1 chunked encoding.
The important part is the --- ignore_response
line which tells the test
scaffold not to complain about the interrupted response data stream. If
the test block above goes without this line, we will see the following
test failure while running prove
:
# Failed test 'TEST 1: aborting response body stream - no last chunk found - 5 # hello # '
Obviously, the test scaffold complains about the lack of the "last chunk" used to indicate the end of the chunked encoded data stream. Because the server aborts the connection in the middle of response body data sending, there is no chance for the server to properly send well-formed response bodies in the chunked encoding.
Testing Timeout Errors
Timeout errors are one of the most common network issues in the real world. Timeout might happen due to many reasons, like packet dropping on the wire or on the other end, connectivity problems, and other expensive operations blocking the event loop. Most of applications want to ensure they have a timeout protection that prevents them from waiting for too long.
Testing and emulating timeout errors are often tricky in a self-contained unit test framework since most of the network traffic initiated by the test cases are local only, that is, going through the local "loopback" device that has perfect latency and throughput. We will examine some of the tricks that can be used to reliably emulate various different kinds of timeout errors in the test suite.
Connecting Timeouts
Connecting timeouts in the context of the TCP protocol are easiest to emulate.
Just point the connecting target to a remote address that always drops
any incoming (SYN
) packets via a firewall rule or something similar.
We provide such a "black-hole service" at the port 12345 of the agentzh.org
host. You can make use of it if your test running environment allows public
network access. Consider the following test case.
=== TEST 1: connect timeout
--- config
resolver 8.8.8.8;
resolver_timeout 1s;
location = /t {
content_by_lua_block {
local sock = ngx.socket.tcp()
sock:settimeout(100) -- ms
local ok, err = sock:connect("agentzh.org", 12345)
if not ok then
ngx.log(ngx.ERR, "failed to connect: ", err)
return ngx.exit(500)
end
ngx.say("ok")
}
}
--- request
GET /t
--- response_body_like: 500 Internal Server Error
--- error_code: 500
--- error_log
failed to connect: timeout
We have to configure the resolver
directive here because we need to resolve
the domain name agentzh.org
at request time (in Lua). We check the NGINX
error log via the error_log
section for the error string returned by
the cosocket object’s connect()
method.
It is important to use a relatively small timeout threshold in the test cases so that we do not have to wait for too long to complete the test run. Tests are meant to be run very often. The more frequently we run the tests, the more value we may gain from automating the tests.
It is worth mentioning that the test scaffold’s HTTP client does have a timeout threshold as well, which is 3 seconds by default. If your test request takes more than 3 seconds, you get an error message in the test report:
ERROR: client socket timed out - TEST 1: connect timeout
This message is what we would get if we commented out the settimeout
call and relies on the default 60 second timeout threshold in cosockets.
We could change this default timeout threshold used by the test scaffold
client by setting a value to the timeout
data section, as in
--- timeout: 10
Now we have 10 seconds of timeout protection instead of 3.
Reading Timeouts
Emulating reading timeouts is also easy. Just try reading from a wire where the other end never writes anything but still keeps the connection alive. Consider the following example:
=== TEST 1: read timeout
--- main_config
stream {
server {
listen 5678;
content_by_lua_block {
ngx.sleep(10) -- 10 sec
}
}
}
--- config
lua_socket_log_errors off;
location = /t {
content_by_lua_block {
local sock = ngx.socket.tcp()
sock:settimeout(100) -- ms
assert(sock:connect("127.0.0.1", 5678))
ngx.say("connected.")
local data, err = sock:receive() -- try to read a line
if not data then
ngx.say("failed to receive: ", err)
else
ngx.say("received: ", data)
end
}
}
--- request
GET /t
--- response_body
connected.
failed to receive: timeout
--- no_error_log
[error]
Here we use the main_config
data section to define a TCP server of our
own, listening at the port of 5678 on the local host. This is a mocked-up
server
that can establish new TCP connections but never write out anything and
just sleep for 10 second before closing the session. Note that we are using
the ngx_stream_lua
module in the stream {}
configuration block. In our location = /t
,
which is the main target of this test case, connects to our mock server
and tries to read a line from the wire. Apparently the 100ms timeout threshold
on the client side is reached first and we can successfully exercise the
error handling code for the reading timeout error.
Sending Timeouts
Triggering sending timeouts is much harder than connecting and reading timeouts. This is due to the asynchronous nature of writing.
For performance reasons, there exists at least two layers of buffers for writes:
-
the userland send buffers inside the NGINX core, and
-
the socket send buffers in the operating system kernel’s TCP/IP stack implementation
To make the situation even worse, there also at least exists a system-level receive buffer layer on the other end of the connection.
To make a send timeout error happen, the most naive way is to fill out all these buffers along the data sending chain while ensuring that the other end never actually reads anything on the application level. Thus, buffering makes a sending timeout particularly hard to reproduce and emulate in a typical testing and development environment with a small amount of (test) payload.
Fortunately there is a userland trick that can intercept the libc wrappers for the actual system calls for socket I/O and do funny things that could otherwise be very difficult to achieve. Our mockeagain library implements such a trick and supports emulating timeout errors at user-specified precise positions in the output data stream.
The following example triggers a sending timeout right after sending out the "hello, world" string as the response body.
=== TEST 1: send timeout
--- config
send_timeout 100ms;
postpone_output 1;
location = /t {
content_by_lua_block {
ngx.say("hi bob!")
local ok, err = ngx.flush(true)
if not ok then
ngx.log(ngx.ERR, "flush #1 failed: ", err)
return
end
ngx.say("hello, world!")
local ok, err = ngx.flush(true)
if not ok then
ngx.log(ngx.ERR, "flush #2 failed: ", err)
return
end
}
}
--- request
GET /t
--- ignore_response
--- error_log
flush #2 failed: timeout
--- no_error_log
flush #1 failed
Note the send_timeout
directive that is used to configure the sending
timeout for NGINX downstream writing operations. Here we use a small threshold,
100ms
, to ensure our test case runs fast and never hits the default 3
seconds timeout threshold of the test scaffold client. The postpone_output
1
directive effectively turns off the "postpone output buffer" of NGINX,
which may hold our output data before even reaching the libc system call
wrappers. Finally, the ngx.flush()
call in Lua ensures that no buffers
along the NGINX output filter chain holds our data without sending downward.
Before running this test case, we have to set the following system environment variables (in the bash syntax):
export LD_PRELOAD="mockeagain.so"
export MOCKEAGAIN="w"
export MOCKEAGAIN_WRITE_TIMEOUT_PATTERN='hello, world'
export TEST_NGINX_EVENT_TYPE='poll'
Let’s go through them one by one:
-
The
LD_PRELOAD="mockeagain.so"
assignment pre-loads themockeagain
library into the running processes, including the NGINX server process started by the test scaffold, of course. You may also need to set theLD_LIBRARY_PATH
environment to include the directory path of themockeagain.so
file if the file is not in the default system library search paths. -
The
MOCKEAGAIN="w"
assignment enables themockeagain
library to intercept and do funny things about the writing operations on nonblocking sockets. -
The
MOCKEAGAIN_WRITE_TIMEOUT_PATTERN='hello, world'
assignment makesmockeagain
refuse to send more data after seeing the specified string pattern,hello, world
, in the output data stream. -
The
TEST_NGINX_EVENT_TYPE='poll'
setting makes NGINX server uses thepoll
event API instead of the system default (beingepoll
on Linux, for example). This is becausemockeagain
only supportspoll
events for now. Behind the scene, this environment just makes the test scaffold generate the followingnginx.conf
snippet.events { use poll; }
You need to ensure, however, that your NGINX or OpenResty build has the
poll
support compiled in. Basically, the build should have the./configure
option--with-poll_module
.We have plans to add epoll edge-triggering support to
mockeagain
in the future. Hopefully by that time we do not have to usepoll
at least on Linux.
Now you should get the test block above passed!
Ideally, we could set these environments directly inside the test file
because this test case will never pass without these environments anyway.
We could add the following Perl code snippet to the very beginning of the
test file prologue (yes, even before the use
statement):
BEGIN {
$ENV{LD_PRELOAD} = "mockeagain.so";
$ENV{MOCKEAGAIN} = "w";
$ENV{MOCKEAGAIN_WRITE_TIMEOUT_PATTERN} = 'hello, world';
$ENV{TEST_NGINX_EVENT_TYPE} = 'poll';
}
The BEGIN {}
block is required here because it runs before Perl loads
any modules, especially Test::Nginx::Socket
, in which we want these environments
to take effect.
It is a bad idea, however, to hard-code the path of the mockeagain.so
file in the test file itself since different test runners might put mockeagain
in different places in the file system. Better let the test runner configure
the LD_LIBRARY_PATH
environment containing the actual library path from
outside.
Mockeagain Troubleshooting
If you are seeing the following error while running the test case above,
ERROR: ld.so: object 'mockeagain.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
then you should check whether you have added the directory path of your
mockeagain.so
library to the LD_LIBRARY_PATH
environment. On my system,
for example, I have
export LD_LIBRARY_PATH=$HOME/git/mockeagain:$LD_LIBRARY_PATH
If you are seeing an error similar to the following,
nginx: [emerg] invalid event type "poll" in .../t/servroot/conf/nginx.conf:76
then your NGINX or OpenResty build does not have the poll module compiled
in. And you should rebuild your NGINX or OpenResty by passing the --with-poll_module
option to the ./configure
command line.
We will revisit the mockeagain
library in the Test Modes
section soon.
Mocking Bad Backend Responses
Earlier in this section we have already seen examples that uses the ngx_stream_lua module to mock a backend TCP server that accepts new incoming connections but never writes anything back. We could of course do fancier things in such a mocked server like emulating a buggy or malicious backend server that returns bad response data.
For example, while testing a Memcached client, it would be pretty hard to emulate erroneous error responses or ill-formed responses with a real Memcached server. Now it is trivial with mocking:
=== TEST 1: get() results in an error response
--- main_config
stream {
server {
listen 1921;
content_by_lua_block {
ngx.print("SERVER_ERROR\r\n")
}
}
}
--- config
location /t {
content_by_lua_block {
local memcached = require "resty.memcached"
local memc = memcached:new()
assert(memc:connect("127.0.0.1", 1921))
local res, flags, err = memc:get("dog")
if not res then
ngx.say("failed to get: ", err)
return
end
ngx.say("get: ", res)
memc:close()
}
}
--- request
GET /t
--- response_body
failed to get: SERVER_ERROR
--- no_error_log
[error]
Our mocked-up Memcached server can behave in any way that we like. Hooray!
Note
|
Test::Nginx::Socket provides the data sections tcp_listen , tcp_query ,
tcp_reply , and etc to enable the builtin mocked TCP server of the test
scaffold. You can use this facility when you do not want to depend on the
ngx_stream_lua module or the NGINX stream subsystem for your test suite.
Indeed, we were solely relying on the builtin TCP server of Test::Nginx::Socket
before the ngx_stream_lua module was born. Similarly, Test::Nginx::Socket
offers a builtin UDP server via the data sections udp_listen , udp_query ,
udp_reply , and etc. You can refer to the official
documentation of Test::Nginx::Socket for more details.
|
Emulating Bad Clients
The Test::Nginx::Socket
test framework provides special data sections
to help emulating ill-behaved HTTP clients.
Crafting Bad Requests
The raw_request
data section can be used to specify whatever data for
the test request. It is often used with the eval
section filter so that
we can easily encode special characters like \r
. Let’s look at the following
example.
=== TEST 1: missing the Host request header
--- config
location = /t {
return 200;
}
--- raw_request eval
"GET /t HTTP/1.1\r
Connection: close\r
\r
"
--- response_body_like: 400 Bad Request
--- error_code: 400
So we easily construct a malformed request that does not have a Host
header, which results in a 400 response from the NGINX server, as expected.
The request
data section we have been using so far, on the other hand,
always ensures that a well-formed HTTP request is sent to the test server.
Emulating Client Aborts
Client aborts are a very intriguing phenomenon in the web world. Sometimes we want the server to continue processing even after the client aborts the connection; on other occasions we just want to abort the whole request handler immediately in such cases. Either way, we need robust way to emulate client aborts in our unit test cases.
We have already discussed the timeout
data section that can be used to
adjust the default timeout protection threshold used by the test scaffold
client. We could also use it to abort the connection prematurely. A small
timeout threshold is often desired for this purpose. To suppress the test
scaffold from printing out an error on client timeout, we can specify the
abort
data section to signal the test scaffold. Let’s put these together
in a simple test case.
=== TEST 1: abort processing in the Lua callback on client aborts
--- config
location = /t {
lua_check_client_abort on;
content_by_lua_block {
local ok, err = ngx.on_abort(function ()
ngx.log(ngx.NOTICE, "on abort handler called!")
ngx.exit(444)
end)
if not ok then
error("cannot set on_abort: " .. err)
end
ngx.sleep(0.7) -- sec
ngx.log(ngx.NOTICE, "main handler done")
}
}
--- request
GET /t
--- timeout: 0.2
--- abort
--- ignore_response
--- no_error_log
[error]
main handler done
--- error_log
client prematurely closed connection
on abort handler called!
In this example, we make the test scaffold client abort the connection
after 0.2 seconds via the timeout
section. Also we prevent the test scaffold
from printing out the client timeout error by specifying the abort
section.
Finally, in the Lua application code, we checks for client abort events
by turning on the lua_check_client_abort
directive and aborts the server
processing by calling ngx.exit(444)
in our Lua callback function registered
by the ngx.on_abort
API.
Clients Never Closing Connections
Unlike most well-formed HTTP clients in the market, the HTTP client used
by Test::Nginx::Socket
never actively closes the connection unless a
timeout error happens (exceeding the timeout threshold as specified by
the --- timeout
section). This can ensure the NGINX server always actually
closes the connection when the request specifies the "Connection: close"
request header.
When the server does not close the connection, there is a "connection leak"
bug on the server side. For example, NGINX uses reference counting (in
r→main→count
) in its HTTP subsystem to determine whether a request
can be closed and freed. When there is an error in this reference counting,
NGINX may never close the request, leading to resource leaks. In such cases,
the corresponding test cases always fail with a client-side timeout error,
for instance,
# Failed test 'ERROR: client socket timed out - TEST 1: foo
# '
Obviously Test::Nginx::Socket
is a malicious HTTP client by default in
this aspect. This is also why our test scaffold avoids using a well-formed
HTTP client library itself. Most test suite is focusing on extreme and
erroneous cases anyway and well-formed HTTP clients help hiding problems
instead of exposing them.