The rationale for increasing the TCP keepalive interval from 15 seconds
(default) to 1 hour follows.
- Why increasing TCP keepalives for downstream connections is not an
issue wrt to detecting connection interruptions
The use case of TCP keepalives is detecting whether a TCP connection was
forcefully shut down without receiving any TCP FIN or RST frame, when no
data are sent from that endpoint to the other peer.
If any data is sent from the peer and is not ACKed because the
connection was interrupted, the socket will be closed after the TCP RTO
(usually a few seconds) anyway, without the need for TCP keepalives.
Therefore the only use of TCP keepalives is making sure that a peer that
is not writing anything to the socket, and is actively reading and
waiting for new stream data to be received, can, - instead of waiting
forever to receive packets that will never arrive because the connection
was interrupted -, detect this disconnection, close the connection
locally, then try to connect again to its peer.
This only makes sense from a client point-of-view. When an IRC client is
not write(2)ing anything to the socket but is simply waiting for new
messages to arrive, ie read(2)ing on the socket, it must ensure that the
connection is still alive so that any new messages will indeed be sent
to him. So that IRC client should probably enable TCP keepalives.
However, when an IRC server is not writing anything to its downstream
socket, it doesn't care if it misses any messages from its downstream
client: in any case, the downstream client will instantly detect when
its messages are not reaching its server, because of the TCP RTO
(keepalives are not even needed in the client in that specific case),
and will try to reconnect to the server.
Thus TCP keepalives should be enabled for upstream connections, in
order to make sure that soju does not miss any messages coming from
upstream servers, but TCP keepalives are not needed for downstream
connections.
- Why increasing TCP keepalives for downstream connections is not an
issue wrt security, performance, and server socket resources
exhaustion
TCP keepalives are orthogonal to security. Malicious clients can open
thousands of TCP connections and keep them open with minimal
bookkeeping, and TCP keepalives will not prevent attacks planning to
use up all available sockets to soju.
It is also unlikely that soju will keep many connections open, and in
the event that thousands of dead, disconnected connections are active in
soju, any upstream message that needs to be sent to downstreams will
disconnect all disconnected downstreams after the TCP RTO (a few
seconds). Performance could only be slightly affected in the few seconds
before a TCP RTO if many messages were sent to a very large number of
disconnected connections, which is extremely unlikely and not a large
impact to performance either.
- Why increasing TCP keepalives could be helpful to some clients running
on mobile devices
In the current state of IRC, most clients running on mobile devices
(mostly running Android and iOS) will probably need to stay connected
at all times, even when the application is in background, in order to
receive private messages and highlight notifications, complete chat
history (and possibly reduced connection traffic due to avoiding all the
initial messages traffic, including all NAMES and WHO replies which
are quite large).
This means most IRC clients on mobile devices will keep a socket open at
all times, in background. When a mobile device runs on a cellular data
connection, it uses the phone wireless radio to transmit all TCP
packets, including TCP packets without user data, for example TCP
keepalives.
On a typical mobile device, a wireless radio consumes significant power
when full active, so it switches between several energy states in order
to conserve power when not in use. It typically has 3 energy states,
from Standby, when no messages are sent, to Low Power, to Full Power;
and switches modes on an average time scale of 15s. This means that any
time any TCP packet is sent from any socket on the device, the radio
switches to a high-power energy state, sends the packet, then stays on
that energy state for around 15s, then goes back to Standby. This
does include TCP keepalives.
If a TCP keepalive of 15s was used, this means that the IRC server would
force all clients running on mobile devices to send a TCP keepalive
packet at least once every 15s, which means that the radio would stay
in its high-power energy state at all times. This would consume a
very significant amount of power and use up battery much faster.
Even though it would seem at first that a mobile device would have many
different sockets open at any time; actually, a typical Android device
typically has at one background socket open, with Firebase Cloud
Messaging, for receiving instant push notifications (for example, for
the equivalent of IRC highlights on other messaging platforms), and
perhaps a socket open for the current foreground app. When the current
foreground app does not use the network, or when no app is currently
used and the phone is in sleep mode, and no notifications are sent, then
the device can effectively have no wireless radio usage at all. This
makes removing TCP keepalives extremely significant with regard to the
mobile device battery usage.
Increasing the TCP keepalive from soju lets downstream clients choose
their own keepalive interval and therefore possibly save battery for
mobile devices. Most modern mobile devices have complex heuristics for
when to sleep the CPU and wireless radio, and have specific rules for
TCP keepalives depending on the current internet connection, sleep
state, etc.
By increasing the downstream TCP keepalive to such a high period, soju
lets clients choose their most optimal TCP keepalive period, which means
that in turn clients can possibly let their mobile device platform
choose best that keepalive for them, thus letting them save battery in
those cases.
X-Forwarded-Port contains the destination port, not the source port,
so it isn't useful for our purposes.
Move parsing of X-Forwarded-* header fields to parseForwarded.
Prior to being registered, upstreamConn.handleMessage doesn't run
in the user goroutine, it runs in a goroutine specific to the
network. Thus we shouldn't access any user data structure from
there.
downstreamConn.updateSupportedCaps is already called from the
eventUpstreamConnected handler in user.run, the call being removed
was unnecessary.
Closes: https://todo.sr.ht/~emersion/soju/108
The methods didn't have pointer receivers. Thus the deadline fields
were only updated for the local variable.
Closes: https://todo.sr.ht/~emersion/soju/106
... and do not forward INVITEs to downstreams that do not support the
capability.
The downstream capability can be permanent because there is no way for a
client to get the list of people invited to a channel, thus no state can
be corrupted.
... so that the JOIN/history batch takes into account all capabilities.
Without this commit for example, enabling multi-prefix after the batch
makes the client send NAMES requests for all channels, which generate
needless traffic.
This adds the `channel update` service command, which is used to set the
auto-detach, auto-reattach, and message relaying settings of a channel.
Of note is that currently the parser parses `#` as a comment, which
means any `channel update #foo ...` will actually need to be escaped to
`channel update "#foo" ...`
This uses the fields added previously to the Channel struct to implement
the actual detaching/reattaching/relaying logic.
The `FilterDefault` values of the messages filters are currently
hardcoded.
The values of the message filters are not currently user-settable.
This introduces a new user event, eventChannelDetach, which stores an
upstreamConn (which might become invalid at the time of processing), and
a channel name, used for auto-detaching. Every time the channel detach
timer is refreshed (by receveing a message, etc.), a new timer is
created on the upstreamChannel, which will dispatch this event after the
duration (and discards the previous timer, if any).
This adds several fields to the channel database schema and struct.
These fields will be used to add support for customizable message
relaying through BouncerServ, auto-reattaching, auto-detaching.
- RelayDetached is a filter for which notices to relay through
BouncerServ for detached channels.
- ReattachOn is a filter for which messages to trigger a channel
reattach on.
- DetachAfter is the duration after which to automatically detach a
channel if no matching messages are received.
- DetachOn is a filter for which messages will reset the auto-detach
timer.
This commit prevents downstream from sending those commands:
- NICK BouncerServ
- NICK BouncerServ/<network>
The later is necessary because soju would otherwise save the nick change
and, in the event that the downstream connects in single-upstream mode
to <network>, it will end up with the nickname "BouncerServ".
This patch implements basic message delivery receipts via PING and PONG.
When a PRIVMSG or NOTICE message is sent, a PING message with a token is
also sent. The history cursor isn't immediately advanced, instead the
bouncer will wait for a PONG message before doing so.
Self-messages trigger a PING for simplicity's sake. We can't immediately
advance the history cursor in this case, because a prior message might
still have an outstanding PING.
Future work may include optimizations such as removing the need to send
a PING after a self-message, or groupping multiple PING messages
together.
Closes: https://todo.sr.ht/~emersion/soju/11