Zum Inhalt der Seite gehen


A chat server running on powerful hardware collapsing when handling more than 100 events per second isn't acceptable. Events scale up based on room activity from non-local users including spammers too. It's an issue for a server with 12 users too.

https://element.io/blog/scaling-to-millions-of-users-requires-synapse-pro/
Dieser Beitrag wurde bearbeitet. (5 Tage her)
lol #Matrix

the totally awesome FOSS-as-in-freedom chat solution that's always 2 years away from working properly
Not FOSS anymore.
So I guess it's xmpp and simplex chat now ?
XMPP and let's hope conduit becomes as usable as Synapse at the earliest.
Conduit is dead.
Why do you think so?
This looks good, no?
https://gitlab.com/famedly/conduit/activity
It's nowhere near close to being a serious option, is very incomplete and yet development moves incredibly slowly. There's barely any active development remaining. There's a far more actively develop[ed fork of it which is actually quite usable but the branding is not going to appeal to everyone...
Do you mean conduwuit ? Do you have any feedback about switching from conduit to conduwuit ? I don't care at all about the branding but I didn't switched for now because I don't know if it's worth it
oh right, brilliant point that my non-caffeinated ass completely disregarded
See the linked post above.
Choosing to write the Matrix server software in Python in the first place was a huge mistake. It's now far harder to develop and maintain the software. It heavily contributes to it being buggy and fragile. It's the biggest factor in it being so incredibly slow and hard to scale.
what would be a proper programming language to code in?

PS: I finally finished setting up my Pixel 9 XL with GrapheneOS.

Unfortunately I'm stuck to WhatsApp -_-

But wherever I could I disabled network or background use plus RethinkDNS with filter lists.
Elixir might be a good fit for a Matrix server.
@GrapheneOS
It's such a complex and inefficient protocol that it needs all the performance and memory efficiency that it can get. It doesn't resemble a straightforward chat server implementation at all. The overhead largely comes from the decentralized nature or rooms with a decentralized consensus protocol between servers. Every invite, join, leave, kick, ban, power level change, room setting change, etc. is a state event which has to be kept around and is part of state resolution.
the matrix folk themselves have a “next gen” server project called dendrite that’s written in Go which seems like a reasonable choice… but it has been continually playing catch-up with synapse since it was started 7 years ago and still isn’t in a position to replace it. It just isn’t a priority to them, for whatever reason.
Go or Java would have been a reasonable language platform for the original implementation from a perspective perspective. Java has gotten much better in recent years and there are other Java ecosystem languages such as Kotlin which are less verbose. Both also have great library ecosystems which for this kind of software have better rather than worse options available than Python.

For a rewrite, neither of those are really a great choice for this when it's so performance critical.
Their choice of Rust for a partial reimplementation of the synapse code makes a lot of sense. Matrix is such a heavyweight protocol that it really calls for maximizing performance in every possible way. They're reusing the same architecture, design, algorithms, etc. from synapse so the fact that they got a 500x performance improvement out of it is truly ridiculous. It would perform far better than that if it was truly starting over and able to change the db, protocol, etc.
I briefly considered starting a Java implementation some time ago. But it's already pretty complex to implement 😩
The Problem with alternative implementations is that the spec will remain a moving target. It basically is „what the one reference implementation does“.
The protocol and implementation have major bugs which led to state resets causing the room to reset back to previous states dropping many of the users and bricking it. These issues have often being triggered by alternate server implementations doing things differently or not being complete. Of course, it's not reasonable to have a public-facing chat protocol where rooms can be bricked by sending certain state changes from other servers but that's Matrix.
Eww. But, yeah… adding complexity is the enemy of robustness. It is a set of hard problems however. I‘m not envying them for the task.
Python is definitely not the only problem but they made this vastly harder for themselves by writing the reference implementation in Python. Getting 500x better performance from simply porting it 1:1 to another language able to be dropped into the existing implementation is genuinely ridiculous. That's not Rust being incredible but rather Python being astoundingly slow.500x better performance isn't a high performance chat server... the starting point was that bad.
They didn't get that performance by switching to a new architecture, algorithms and data structures. It's from directly porting it to another language. It goes to show how awful Python was as a language choice for this. If it was simply written in Java, porting it to Rust might have still ended up giving a 3x performance boost, sure, but they could have focused on others things long before it. They wouldn't have had to start making Dendrite and tons of other work.
If you want a chat server system that can handle more than 100 active users, Matrix isn't the answer. Get an #XMPP server and chat on #Jabber instead.
100 events per second is not 100 users. When you read that post you see that 40 events per second refer to ~50K simultaneous users.
It heavily depends on the rooms and federating servers. It also isn't a constant rate based on number of users but rather heavily fluctuates based on who is actively using it, what they're doing and the rooms they're in.
Yes, that‘s true.
Agreed, and not only that, it looks like they can't make up their mind on Dendrite. First it's a re-implementation, then the purpose is to be client-embeddable only (why, just do both?), I lost count on what the hell they are doing.
Instagram is largely written in Python and WhatsApp in Erlang, both relatively slow languages compared to C/C++/Rust/Zig/Go, and yet they manage to scale. At a previous company, we had a Python based server handling 4000 sustained requests/second with 9ms latency at the 99th percentile, so it's not just the language at fault.
> Instagram is largely written in Python and WhatsApp in Erlang,

Those aren't decentralized chat protocols and had massive capital to throw at servers to delay efficiency to a later late. Instagram is not real time chat. WhatsApp has a low max size on groups even today despite all being on 1 server. Aside from that, evidence is needed that they're actually running in those languages to a substantial extent especially today. It's not verifiable in any way.
calling Erlang *relatively slow* in the context of high-scale deployments like WhatsApp is unfair
Matrix is meant to be a decentralized chat protocol easily deployed by other people. They threw a massive amount of resources into hosting matrix.org which could have gone to development. They never had a large amount of capital available and put a massive amount of money into hosting matrix.org instead of developing a great chat platform. It's not comparable to burning massive amounts of VC funding on hosting to more quickly develop a product, then rewriting it with more.
A major part of why Matrix is so centralized around matrix.org is that the server software it's incredibly bloated, slow and doesn't scale well. Centralized software would not have to be designed in the same way and wouldn't have been in the same situation. Also, Matrix has more and more overhead as the decentralization increases. It scales up the load not only room activity and local users but also number of federating servers. If everyone had their own, it'd be a disaster.
it's not because of language.
It's because of its design.
No, and they got a 500x performance improvement porting the same architecture, database design, algorithms, data structures, etc. to Rust within the existing system. Sure, more is wrong with it than that but it's the biggest problem for that server software.
but still designs of this shitty system is the problem.
You can rewrite it to another language but you cannot change the design of it.

Prosody can take a lot of active users and it's written in Lua...

The only choice is XMPP.
It's not decentralized in the same sense. Rooms are hosted on a specific server. Matrix isn't particularly decentralized in practice but it was intended to be.
slightly simplistic view… it’s very possible to write python that doesn’t suck (though it wouldn’t be my first choice for this; Rust or something Erlang-ish probably). But from an outside perspective it sure sounds like they are intentionally handicapping the design of the open source product.
They got 500x performance with a partial in-place rewrite in Rust using the same architecture, database structure / connections, algorithms, data structures, etc. It's an in-place partial rewrite and all that's really changing is the language. They put huge effort into trying to optimize the Python and make it work that way. It is not capable of providing reasonable performance for this. It's also clearly far less maintainable at this level of complexity and scale.
if they ready to reimplement it using Rust, I will be happy to contribute

Also I will leave it here, Rust FTW
https://m.youtube.com/watch?v=3fWx5BOiUiY&pp=ygUKcnVzdCB2cyBnbw%3D%3D
They're reimplementing it in Rust as a closed source fork of the open source Synapse, which they say already has 500x better performance.
do you think there is a clear culprit, like bloated data structures, too much context switching, GIL concurrency issues etc.? Or is it likely a heavy mix of everything and the overall language design is fundamentally unfit for such tasks?
It's unfit for this purpose due to the massive overhead for anything written in Python code, lack of proper threading due to the GIL, etc. The main way people write higher performance Python is delegating nearly all the work to C extensions, which still has high overhead in most cases. The data structures do have high memory and performance overhead but so does everything else in the language. It's also not good at async. It's not suited to writing a server with reasonable performance.
Thank you. I'm sorry, I did not notice the attached blogpost at first, they layout the main bottlenecks quite clearly there. Indeed, not all of them seem to be pure Python's fault. But the decision to introduce whole new code base and commitment to maintain it in parallel seems a bit odd. Not sure how it'll play out in the long run given the already tight budget
So the upshot is that the FOSS server code for Matrix sucks; and Element (who wrote most of that code) has launched a new commercial non-FOSS version of the server that doesn't suck.

Their blog is giving mixed messages though. In earlier posts, Element have argued that it is bad for a "country’s digital infrastructure [to be] operationally dependent on a consumer app from a private tech company". But now they say this kind of private tech is necessary to ensure that the code is properly funded.
The post acknowledges that the current server implementation is terrible but downplays it as only being an issue at a massive scale. If you look at the actual numbers involved and think about it, you can see it's clearly an issue with small deployments with even a dozen users when those users have joined rooms which are being raided by spammers. The events scale up based on room activity from other servers not only local users. They also scale up based on having more federating servers.
There is no good technical reason why the community version of Synapse should not include the worker implemented in Rust. A smaller, resource constrained server, such as a single board computer with 2 GB memory would benefit from a more efficient implementation.

Element seems to be adopting the open-core business model like GitLab. I don't blame them for this; they did mention funding problems recently, but the arguments and analogies in the article don't make sense.
This is a much less reasonable way of doing open core. Synapse is barely functional and gets bogged down by 1 person doing scripting to attack the overall public Matrix federation with spam. Their intention doesn't even need to be taking down servers in order for that to be the end result. Spam via automated account creation across servers ends up causing denial of service right now. Imagine if they started automatically creating servers at different subdomains or whole domains, etc.
Do these issues also apply to small scale private instances that don't participate in federation? So something along the lines of "hosting your own discord" not "participating in a big network of federated identity providers"? Because the government instances in France and Germany are probably not participating in federation.
Dieser Beitrag wurde bearbeitet. (4 Tage her)
It has awful performance and scalability with or without federation but federation does make it a bigger problem especially since the anti-abuse is so terrible. Federation also leads to lots of issues with decentralized state resolution resulting in bricked rooms. If they're not going to use federation, why would Matrix be chosen though? It makes huge sacrifices to support federation and is quite slow and awkward to use because of the resulting complexity, overhead, etc.
So what are the alternatives if you want to have a full-fledged Teams or Discord experience including Messaging and Video Calls? Is there any reliable self-hosted FOSS Solution available that doesn't have the issues Matrix has? I don't know any.
So they are maintaining two implementation. An efficient commercial one that they will use to fund the development of the inefficient community one? Wtf
Yes, exactly.
Aren't there some good third-party(-ish) implementations of it e.g. Dendrite or Conduit)?

The idea of using the funding of an objectively better closed-source version to fund the worse open-source version is absolutely mind-boggling to me. I'm almost absolutely positive that the target audience for something like Synapse Pro would *never* move to Matrix.
This decision looks like an enterprise money cash-grab and not even a good one.
Dendrite was being made by this same group of people and is essentially dead:

https://github.com/element-hq/dendrite

It has essentially no active development. It goes to show how many resources they're wasting.

Conduit is also close to dead and is hardly usable. Both Dendrite and Conduit have massive issues with bricking rooms. Conduit has a much more actively developed fork.
I assume you are talking about conduwuit? Why didn't you actually specify the name of the project? Is there something wrong with it?
letting all those community servers run on extremely inefficient code has to have an awful environmental impact too, no?
There aren't that many of them. It is certainly wasting the time and money of people trying to use Matrix and experiencing awful performance/usability due to this. Spam attacks turn into denial of service even with only just manual account creation and messages.