Add a new flag to disable users. This can be useful to temporarily
deactivate an account without erasing data.
The user goroutine is kept alive for simplicity's sake. Most of the
infrastructure assumes that each user always has a running goroutine.
A disabled user's goroutine is responsible for sending back an error
to downstream connections, and listening for potential events to
re-enable the account.
On some systems (namely Windows), syscall.Rlimit is not defined, and
makes the build fail.
This fixes the build by making the rlimit calls only run on archs where
it is defined, defaulting to a stub on other systems.
See: 8427429c59
The bouncer process may be dealing with many opened FDs. The default
on Linux is 1024. To support bouncers with a lot of users, bump
RLIMIT_NOFILE to the max as advised in [1].
[1]: http://0pointer.net/blog/file-descriptor-limits.html
This is a mecanical change, which just lifts up the context.TODO()
calls from inside the DB implementations to the callers.
Future work involves properly wiring up the contexts when it makes
sense.
go-proxyproto added support for a read timeout in 0.6.0[1] and
defaulted it to 200ms. After this time if no data is read on
the socket, it is closed.
This is _really_ low if the underlying connection is a TLS
one as no data pops out the other end until the handshake is
done. It effectively limits you to TLS connections within
a 50ms RTT of your bouncer with clients that are fast enough
at responding.
It appears that HexChat on Arch is somehow slow enough at
TLS connections thant it consistently takes longer than
200ms even over localhost, meaning it outright can't connect
to soju any longer.
To make this a lot less painful, have soju pass in a read
timeout of 5 seconds. This feels like a reasonable tradeoff
between keeping (possibly malicious) connections open and
accepting the realities of network connections.
[1]: https://github.com/pires/go-proxyproto/issues/65
Previously http.Server.ListenAndServeTLS would return a not very helpful
error about a failed open. This adds a check similar to the one in the
ircs case that should make it clearer to operators what the error is.
The rationale for increasing the TCP keepalive interval from 15 seconds
(default) to 1 hour follows.
- Why increasing TCP keepalives for downstream connections is not an
issue wrt to detecting connection interruptions
The use case of TCP keepalives is detecting whether a TCP connection was
forcefully shut down without receiving any TCP FIN or RST frame, when no
data are sent from that endpoint to the other peer.
If any data is sent from the peer and is not ACKed because the
connection was interrupted, the socket will be closed after the TCP RTO
(usually a few seconds) anyway, without the need for TCP keepalives.
Therefore the only use of TCP keepalives is making sure that a peer that
is not writing anything to the socket, and is actively reading and
waiting for new stream data to be received, can, - instead of waiting
forever to receive packets that will never arrive because the connection
was interrupted -, detect this disconnection, close the connection
locally, then try to connect again to its peer.
This only makes sense from a client point-of-view. When an IRC client is
not write(2)ing anything to the socket but is simply waiting for new
messages to arrive, ie read(2)ing on the socket, it must ensure that the
connection is still alive so that any new messages will indeed be sent
to him. So that IRC client should probably enable TCP keepalives.
However, when an IRC server is not writing anything to its downstream
socket, it doesn't care if it misses any messages from its downstream
client: in any case, the downstream client will instantly detect when
its messages are not reaching its server, because of the TCP RTO
(keepalives are not even needed in the client in that specific case),
and will try to reconnect to the server.
Thus TCP keepalives should be enabled for upstream connections, in
order to make sure that soju does not miss any messages coming from
upstream servers, but TCP keepalives are not needed for downstream
connections.
- Why increasing TCP keepalives for downstream connections is not an
issue wrt security, performance, and server socket resources
exhaustion
TCP keepalives are orthogonal to security. Malicious clients can open
thousands of TCP connections and keep them open with minimal
bookkeeping, and TCP keepalives will not prevent attacks planning to
use up all available sockets to soju.
It is also unlikely that soju will keep many connections open, and in
the event that thousands of dead, disconnected connections are active in
soju, any upstream message that needs to be sent to downstreams will
disconnect all disconnected downstreams after the TCP RTO (a few
seconds). Performance could only be slightly affected in the few seconds
before a TCP RTO if many messages were sent to a very large number of
disconnected connections, which is extremely unlikely and not a large
impact to performance either.
- Why increasing TCP keepalives could be helpful to some clients running
on mobile devices
In the current state of IRC, most clients running on mobile devices
(mostly running Android and iOS) will probably need to stay connected
at all times, even when the application is in background, in order to
receive private messages and highlight notifications, complete chat
history (and possibly reduced connection traffic due to avoiding all the
initial messages traffic, including all NAMES and WHO replies which
are quite large).
This means most IRC clients on mobile devices will keep a socket open at
all times, in background. When a mobile device runs on a cellular data
connection, it uses the phone wireless radio to transmit all TCP
packets, including TCP packets without user data, for example TCP
keepalives.
On a typical mobile device, a wireless radio consumes significant power
when full active, so it switches between several energy states in order
to conserve power when not in use. It typically has 3 energy states,
from Standby, when no messages are sent, to Low Power, to Full Power;
and switches modes on an average time scale of 15s. This means that any
time any TCP packet is sent from any socket on the device, the radio
switches to a high-power energy state, sends the packet, then stays on
that energy state for around 15s, then goes back to Standby. This
does include TCP keepalives.
If a TCP keepalive of 15s was used, this means that the IRC server would
force all clients running on mobile devices to send a TCP keepalive
packet at least once every 15s, which means that the radio would stay
in its high-power energy state at all times. This would consume a
very significant amount of power and use up battery much faster.
Even though it would seem at first that a mobile device would have many
different sockets open at any time; actually, a typical Android device
typically has at one background socket open, with Firebase Cloud
Messaging, for receiving instant push notifications (for example, for
the equivalent of IRC highlights on other messaging platforms), and
perhaps a socket open for the current foreground app. When the current
foreground app does not use the network, or when no app is currently
used and the phone is in sleep mode, and no notifications are sent, then
the device can effectively have no wireless radio usage at all. This
makes removing TCP keepalives extremely significant with regard to the
mobile device battery usage.
Increasing the TCP keepalive from soju lets downstream clients choose
their own keepalive interval and therefore possibly save battery for
mobile devices. Most modern mobile devices have complex heuristics for
when to sleep the CPU and wireless radio, and have specific rules for
TCP keepalives depending on the current internet connection, sleep
state, etc.
By increasing the downstream TCP keepalive to such a high period, soju
lets clients choose their most optimal TCP keepalive period, which means
that in turn clients can possibly let their mobile device platform
choose best that keepalive for them, thus letting them save battery in
those cases.
IPs whitelisted in accept-proxy-ip can now use the PROXY protocol to
indicate the original source/destination addresses.
Closes: https://todo.sr.ht/~emersion/soju/81
This adds a new flag, `-admin` for creating admin users, which can
access admin service commands, among which create-user to create other
users on-the-fly.
Since the person running the commands in the README will be the local
soju administrator, the user they create should be admin as well, hence
the README update.
Reading from stdin with Scanner.Scan() can either fail because of a read
error, or return no bytes because the EOF was reached.
This adds support for checking these cases before actually reading the
password.