TL;DR: supports for casemapping, now logs are saved in
casemapped/canonical/tolower form
(eg. in the #channel directory instead of #Channel... or something)
== What is casemapping? ==
see <https://modern.ircdocs.horse/#casemapping-parameter>
== Casemapping and multi-upstream ==
Since each upstream does not necessarily use the same casemapping, and
since casemappings cannot coexist [0],
1. soju must also update the database accordingly to upstreams'
casemapping, otherwise it will end up inconsistent,
2. soju must "normalize" entity names and expose only one casemapping
that is a subset of all supported casemappings (here, ascii).
[0] On some upstreams, "emersion[m]" and "emersion{m}" refer to the same
user (upstreams that advertise rfc1459 for example), while on others
(upstreams that advertise ascii) they don't.
Once upstream's casemapping is known (default to rfc1459), entity names
in map keys are made into casemapped form, for upstreamConn,
upstreamChannel and network.
downstreamConn advertises "CASEMAPPING=ascii", and always casemap map
keys with ascii.
Some functions require the caller to casemap their argument (to avoid
needless calls to casemapping functions).
== Message forwarding and casemapping ==
downstream message handling (joins and parts basically):
When relaying entity names from downstreams to upstreams, soju uses the
upstream casemapping, in order to not get in the way of the user. This
does not brings any issue, as long as soju replies with the ascii
casemapping in mind (solves point 1.).
marshalEntity/marshalUserPrefix:
When relaying entity names from upstreams with non-ascii casemappings,
soju *partially* casemap them: it only change the case of characters
which are not ascii letters. ASCII case is thus kept intact, while
special symbols like []{} are the same every time soju sends them to
downstreams (solves point 2.).
== Casemapping changes ==
Casemapping changes are not fully supported by this patch and will
result in loss of history. This is a limitation of the protocol and
should be solved by the RENAME spec.
Prior to being registered, upstreamConn.handleMessage doesn't run
in the user goroutine, it runs in a goroutine specific to the
network. Thus we shouldn't access any user data structure from
there.
downstreamConn.updateSupportedCaps is already called from the
eventUpstreamConnected handler in user.run, the call being removed
was unnecessary.
Closes: https://todo.sr.ht/~emersion/soju/108
... and do not forward INVITEs to downstreams that do not support the
capability.
The downstream capability can be permanent because there is no way for a
client to get the list of people invited to a channel, thus no state can
be corrupted.
This uses the fields added previously to the Channel struct to implement
the actual detaching/reattaching/relaying logic.
The `FilterDefault` values of the messages filters are currently
hardcoded.
The values of the message filters are not currently user-settable.
This introduces a new user event, eventChannelDetach, which stores an
upstreamConn (which might become invalid at the time of processing), and
a channel name, used for auto-detaching. Every time the channel detach
timer is refreshed (by receveing a message, etc.), a new timer is
created on the upstreamChannel, which will dispatch this event after the
duration (and discards the previous timer, if any).
This patch implements basic message delivery receipts via PING and PONG.
When a PRIVMSG or NOTICE message is sent, a PING message with a token is
also sent. The history cursor isn't immediately advanced, instead the
bouncer will wait for a PONG message before doing so.
Self-messages trigger a PING for simplicity's sake. We can't immediately
advance the history cursor in this case, because a prior message might
still have an outstanding PING.
Future work may include optimizations such as removing the need to send
a PING after a self-message, or groupping multiple PING messages
together.
Closes: https://todo.sr.ht/~emersion/soju/11
Introduce a messageStore type, which will allow for multiple
implementations (e.g. in the DB or in-memory instead of on-disk).
The message store is per-user so that we don't need to deal with locking
and it's easier to implement per-user limits.
This simple implementation only advertises extended-join to downstreams
when all upstreams support it.
In the future, it could be modified so that soju buffers incoming
upstream JOINs, sends a WHO, waits for the reply, and sends an extended
join to the downstream; so that soju could advertise that capability
even when some or all upstreams do not support it. This is not the case
in this commit.
Instead, always read chat history from logs. Unify the implicit chat
history (pushing history to clients) and explicit chat history
(via the CHATHISTORY command).
Instead of keeping track of ring buffer cursors for each client, use
message IDs.
If necessary, the ring buffer could be re-introduced behind a
common MessageStore interface (could be useful when on-disk logs are
disabled).
References: https://todo.sr.ht/~emersion/soju/80
For now, these can be used as cursors in the logs. Future patches will
introduce functions that perform log queries with message IDs.
The IDs are state-less tokens containing all the required information to
refer to an on-disk log line: network name, entity name, date and byte
offset. The byte offset doesn't need to point to the first byte of the
line, any byte will do (note, this makes it so message IDs aren't
necessarily unique, we may want to change that in the future).
These internal message IDs are not exposed to clients because we don't
support upstream message IDs yet.
Keep the ring buffer alive even if all clients are connected. Keep the
ID of the latest delivered message even for online clients.
As-is, this is a net downgrade: memory usage increases because ring
buffers aren't free'd anymore. However upcoming commits will replace the
ring buffer with log files. This change makes reading from log files
easier.
This defers TLS handshake until the first read or write operation. This
allows the upcoming identd server to register the connection before the
TLS handshake is complete, and is necessary because some IRC servers
send an ident request before that.
When Unix socket support will be added for listeners, unix:// will be
ambiguous. It won't be clear whether to setup an IRC server, or some
other kind of server (e.g. identd).
unix:// is still recognized to avoid breaking existing DBs.
In case labelled-response isn't supported, broadcast unhandled messages
to all downstream connections. That's better than silently dropping the
messages.
Currently, a downstream receives MODE, RPL_CHANNELMODEIS and
RPL_CREATIONTIME messages from soju for detached channels. It should not
be sent any of these messages.
This adds a detach check to the handling of these messages to avoid
receiving these messages.
WebSocket connections allow web-based clients to connect to IRC. This
commit implements the WebSocket sub-protocol as specified by the pending
IRCv3 proposal [1].
WebSocket listeners can now be set up via a "wss" protocol in the
`listen` directive. The new `http-origin` directive allows the CORS
allowed origins to be configured.
[1]: https://github.com/ircv3/ircv3-specifications/pull/342
Previously, we did not skip the first RPL_INVITING parameter, which is
the user nick (like in all replies), which made the parsing for that
reply incorrect.
This fixes RPL_INVITING parsing by skipping the first parameter.
Previously we dropped all TAGMSG as well as any client message tag sent
from downstream.
This adds support for properly forwarding TAGMSG and client message tags
from downstreams and upstreams.
TAGMSG messages are intentionally not logged, because they are currently
typically used for +typing, which can generate a lot of traffic and is
only useful for a few seconds after it is sent.
This adds support for forwarding all errors and unknown messages labeled
with a specific downstream to that downstream.
Provided that the upstream supports labeled-response, users will now be
able to receive an error only on their client when making a command that
returns an error, as well as receiving any reply unknown to soju.
Previously, the downstream nick was never changed, even when the
downstream sent a NICK message or was in single-server mode with a
different nick.
This adds support for updating the downstream nick in the following
cases:
- when a downstream sends NICK
- additionally, in single-server mode:
- when a downstream connects and its single network is connected
- when an upstream connects
- when an upstream sends NICK
When SASL is not used, we should only send CAP END after we send a CAP
REQ. Previously CAP END was sent both after a CAP REQ and a CAP ACK,
resulting in two CAP END messages.
Sending a CAP END right after the CAP REQ rather than waiting for the
CAP ACK/NAK saves 1 RTT.
Even though the memberships map has type map[string]*memberships
(with memberships being defined as []membership), the default value for
that map should not be `nil` but a pointer to a nil slice.
This fixes a segfault on some servers before user channel prefixes are
sent.
Previously, we only considered channel modes in the modes of a MODE
messages, which means channel membership changes were ignored. This
resulted in bugs where users channel memberships would not be properly
updated and cached with wrong values. Further, mode arguments
representing entities were not properly marshaled.
This adds support for correctly parsing and updating channel memberships
when processing MODE messages. Mode arguments corresponding to channel
memberships updates are now also properly marshaled.
MODE messages can't be easily sent from history because marshaling these
messages require knowing about the upstream available channel types and
channel membership types, which is currently only possible when
connected. For now this is not an issue since we do not send MODE
messages in history.
User channel memberships are actually a set of memberships, not a single
value. This introduces memberships, a type representing a set of
memberships, stored as an array of memberships ordered by descending
rank.
This also adds multi-prefix to the permanent downstream and upstream
capabilities, so that we try to get all possible channel memberships.
Channels can now be detached by leaving them with the reason "detach",
and re-attached by joining them again. Upon detaching the channel is
no longer forwarded to downstream connections. Upon re-attaching the
history buffer is sent.
This makes use of cap-notify to dynamically advertise support for
away-notify. away-notify is advertised to downstream connections if all
upstreams support it.
Some servers do not support TLS, or have invalid, expired or self-signed
TLS certificates. While the right fix would be toi contact each server
owner to add support for valid TLS, supporting plaintext upstream
connections is sometimes necessary.
This adds support for the irc+insecure address scheme, which connects to
a network in plain-text over TCP.
This is preparatory work for adding other connection types to upstream
servers. The service command `network create` now accepts a scheme in
the address flag, which specifies how to connect to the upstream server.
The only supported scheme for now is ircs, which is also the default if
no scheme is specified. ircs connects to a network over a TLS TCP
connection.
Store the list of configured channels in the network data structure.
This removes the need for a database lookup and will be useful for
detached channels.
Some servers use custom IRC bots with custom commands for registering to
specific services after connection.
This adds support for setting custom raw IRC messages, that will be
sent after registering to a network.
It also adds support for a custom flag.Value type for string
slice flags (flags taking several string values).
Instead of having one ring buffer per network, each network has one ring
buffer per entity (channel or nick). This allows history to be more
fair: if there's a lot of activity in a channel, it won't prune activity
in other channels.
We now track history sequence numbers per client and per network in
networkHistory. The overall list of offline clients is still tracked in
network.offlineClients.
When all clients have received history, the ring buffer can be released.
In the future, we should get rid of too-old offline clients to avoid
having to maintain history for them forever. We should also add a
per-user limit on the number of ring buffers.
In order to notify the user when we are disconnected from a network
(either due to an error, or due a QUIT), and when we fail reconnecting,
this commit adds support for sending a short NOTICE message from the
service user to all relevant downstreams.
The last error is stored, and cleared on successful connection, to
ensure that the user is *not* flooded with identical connection error
messages, which can often happen when a server is down.
No lock is needed on lastError because it is only read and modified from
the user goroutine.
Closes: https://todo.sr.ht/~emersion/soju/27
Any SendMessage call after Close could potentially block forever if the
outgoing channel was filled up. Now the channel is drained before the
writer goroutine exits.
Add bouncer logs, in a network/channel/date.log format, in a similar
manner to ZNC log module. PRIVMSG, JOIN, PART, QUIT, MODE are logged.
Add a config directive for the logs file, including a way to disable
them entirely.
This commit adds support for downstream LIST messages from multiple
concurrent downstreams to multiple concurrent upstreams, including
support for multiple pending LIST requests from the same downstream.
Because a unique RPL_LISTEND message must be sent to the requesting
downstream, and that there might be multiple upstreams, each sending
their own RPL_LISTEND, a cache of RPL_LISTEND replies of some sort is
required to match RPL_LISTEND together in order to only send one back
downstream.
This commit adds a list of "pending LIST" structs, which each contain a
map of all upstreams that yet need to send a RPL_LISTEND, and the
corresponding LIST request associated with that response. This list of
pending LISTs is sorted according to the order that the requesting
downstreams sent the LIST messages in. Each pending set also stores the
id of the requesting downstream, in order to only forward the replies to
it and no other downstream. (This is important because LIST replies can
typically amount to several thousands messages on large servers.)
When a single downstream makes multiple LIST requests, only the first
one will be immediately sent to the upstream servers. The next ones will
be buffered until the first one is completed. Distinct downstreams can
make concurrent LIST requests without any request buffering.
Each RPL_LIST message is forwarded to the downstream of the first
matching pending LIST struct.
When an upstream sends an RPL_LISTEND message, the upstream is removed
from the first matching pending LIST struct, but that message is not
immediately forwarded downstream. If there are no remaining pending LIST
requests in that struct is then empty, that means all upstreams have
sent back all their RPL_LISTEND replies (which means they also sent all
their RPL_LIST replies); so a unique RPL_LISTEND is sent to downstream
and that pending LIST set is removed from the cache.
Upstreams are removed from the pending LIST structs in two other cases:
- when they are closed (to avoid stalling because of a disconnected
upstream that will never reply to the LIST message): they are removed
from all pending LIST structs
- when they reply with an ERR_UNKNOWNCOMMAND or RPL_TRYAGAIN LIST reply,
which is typically used when a user is not allowed to LIST because they
just joined the server: they are removed from the first pending LIST
struct, as if an RPL_LISTEND message was received
Some servers add a trailing space to the channel list in
RPL_WHOISCHANNELS. This commit works around this issue by removing any
empty trailing element after splitting.
Since RPL_WHOISCHANNELS could send an empty channel parameter, we can't
just use strings.TrimRight(s, " "), because splitting on an empty string
would still return an empty element.
Closes: https://todo.sr.ht/~emersion/soju/25
NOTICE messages can be both special messages from the server (with no
prefix nick), or regular PRIVMSG-like messages from users. This commit
adds support for marshaling channel and user prefixes in the latter
case.
Downstream and upstream message handling are slightly different because
downstreams can send KICK messages with multiple channels or users,
while upstreams can only send KICK messages with one channel and one
user (according to the RFC).
Some servers (namely UnrealIRCd) wrongly add a trailing space to the
members parameters of the RPL_NAMREPLY command, which was not handled
correctly.
Adding a trailing space is not legal wrt the IRC specs, but since
UnrealIRCd does it and is in wide use today, we have to work around it.
Using labeled-response, the replies to several commands such as NAMES,
WHO, WHOIS can be routed back to a specific downstream, rather than
being broadcast to all downstreams.
For example, after this commit, if the server supports labeled-response,
if a downstream requests the NAMES or WHO or WHOIS of a channel, the
replies of the upstream will only be sent back to that downstream, and
the other downstreams won't receive these messages.
- Add RPL_ISUPPORT support with CHANMODES, CHANTYPES, PREFIX parsing
- Add support for channel mode state with mode arguments
- Add upstream support for RPL_UMODEIS, RPL_CHANNELMODEIS
- Request channel MODE on upstream channel JOIN
- Use sane default channel mode and channel mode types
This allows message handlers to read upstream/downstream connection
information without causing any race condition.
References: https://todo.sr.ht/~emersion/soju/1
We now store SASL credentials in the database and automatically populate
them on NickServ REGISTER/IDENTIFY.
References: https://todo.sr.ht/~emersion/jounce/10