Skip to content

Conversation

@Pirulax
Copy link
Contributor

@Pirulax Pirulax commented Apr 3, 2021

As we all know (at least I do), the event handler system in MTA is a big bottleneck of quite a lot of things:

  • setElementData - event calling takes 40-50% of execution (70-80% if called on root)
  • GTASA onClientWorldSound - Since this event is called multiple times a frame it can cause quite bad lag (GTA sounds cause huge FPS drops but don't in SP #1067)
  • God knows what. People like to have crapload of handlers attached (for example, onClientRender: 200 handlers take can 1 ms to execute on my machine)

Pros:

  • Instead of using strings to identify built-in events I use objects (BuiltInEvent) with an unique id assigned to each
  • See readme.md.

Cons:

  • Higher memory usage (theoretically): Depending on arch (x86 / x64) and client/server side it it's an additional 1KB - 2.4 KB / element
  • Needs measurements, but maybe I can add something like EventIDArray, which would manage event ID's (obviously), so that way we could use an vector<unique_ptr<EventHandlerCollection>> regardless of type (BuiltIn / Custom).

Notes:

  • Turns out data locality isn't the big of an issue as I thought. I'll just need some trickery for built-in events to optimize the big element tree walk as that takes quite a lot of time as well.

Benchmarks: See comment below.

@Pirulax
Copy link
Contributor Author

Pirulax commented Apr 3, 2021

As of now only server will compile. Client is TODO.

@Pirulax
Copy link
Contributor Author

Pirulax commented Apr 4, 2021

tl;dr;
Speedups: It normalizes around at least 10 handlers attached to an event. After that its around 50-53% on my machine.

Off-opic /LTCG /GL stuff I also tired with /LTCG /GL and had a consistent 8-10% speedup, but for whatever reason with 28 handlers it became 32%. I really suggest us enabling /LTCG and /GL when shipping MTA.

The attached file is the not-so-tldr version.
MTA Event Handler System refactor benchmarks.pdf
Note regarding the pdf: I think the 30% difference when running 28 handlers is because the linked list blocks were allocated (probably) next to each other, which increased cache hits a lot.

@Pirulax Pirulax marked this pull request as ready for review April 5, 2021 19:48
@Pirulax
Copy link
Contributor Author

Pirulax commented Apr 5, 2021

Ready for review.
Ran unit test, and it worked as expected.

A few things to consider:

Macro of doom

I added a macro of doom in BuiltInEventListApply.h. It's ugly, but we don't have to maintain the same list in 3 different places. If anyone has a better idea, let me know.

Memory usage?

Memory usage? As I said it does increase by a not so marginal ~17% / element. I'll see what the performance difference if I use the hash map for built-in events as well.

Or should we sacrifice memory for CPU?

Realistically people have max 50k elements on the client.
An element is about 10-12 kb altogether in memory.
Lets say 12 kb + my 1800 bytes is 13800. So memory needed for elements is 690 MB, without my stuff its around 600-620.
I don't care about server-side usage, since at most the difference is 200-300 mb, and the server has 64 bit address space, the client doesn't.

Edit: Okay, so I ran a quick test. With /LTCG /GL the difference with hashmap(lower memory usage) vs array(higher) is overall ~10% (up to 33%).
Not sure if its worth the 17% memory tradeoff. I dont think memory is an issue on any of the well written servers.

@Pirulax
Copy link
Contributor Author

Pirulax commented Apr 5, 2021

This isn't all, but I don't want to cramp up the PR too much. I'll remove the need for CLuaArguments in the next PR.

@Einheit-101
Copy link

so when i use
addEventHandler("onClientElementDataChange", localPlayer, onDataChange)
will be pretty fast instead of "root"?

@pentaflops
Copy link
Contributor

pentaflops commented Apr 6, 2021

Why not use hash instead of std::string in CustomEvent?
Wouldn't finding a number in an unordered_map be faster?

@ArranTuna
Copy link
Collaborator

Or should we sacrifice memory for CPU?

Absolutely. CPU usage is the bottleneck for 99% of players.

@ArranTuna
Copy link
Collaborator

so when i use
addEventHandler("onClientElementDataChange", localPlayer, onDataChange)
will be pretty fast instead of "root"?

I thought this was a well known performance tip, but I don't see it mentioned on the addEventHandler wiki page. There is however multiple mentions of it in the page I made based on the things I learned about performance (some of which was from MTA devs) like ccw suggested changing all event handlers for client events to server from root to resourceRoot (you have to change the triggerServerEvent source to resourceRoot too) and doing that had a large CPU usage reduction for the server.

https://wiki.multitheftauto.com/wiki/Scripting_Tips

@Pirulax
Copy link
Contributor Author

Pirulax commented Apr 7, 2021

Why not use hash instead of std::string in CustomEvent?
Wouldn't finding a number in an unordered_map be faster?

Hashes by itself have collisions, thus I would have to do this. Doing this would be a pain in the ass honestly. Also I dont reuse the key, because I only need to do the name => event translation once.

@Pirulax
Copy link
Contributor Author

Pirulax commented Apr 7, 2021

so when i use
addEventHandler("onClientElementDataChange", localPlayer, onDataChange)
will be pretty fast instead of "root"?

I thought this was a well known performance tip, but I don't see it mentioned on the addEventHandler wiki page. There is however multiple mentions of it in the page I made based on the things I learned about performance (some of which was from MTA devs) like ccw suggested changing all event handlers for client events to server from root to resourceRoot (you have to change the triggerServerEvent source to resourceRoot too) and doing that had a large CPU usage reduction for the server.

https://wiki.multitheftauto.com/wiki/Scripting_Tips

ASAP this PR gets merged I will make another to add arguments to limit triggering on children, since that takes a lot of time. So, imagine you trigger something on root: the program has to traverse the whole tree, and call all related handlers on any of the elements, thats why its so slow.

@Einheit-101
Copy link

And what happens when i use "onClientRender" with anything else than root??

@Pirulax
Copy link
Contributor Author

Pirulax commented Apr 7, 2021

Render events a special, they (internally) are triggered only on root, and aren't propagated down to children.
Try attaching it to anything other than root, it wont work.
(Unless you manage to make a new root element, and set the original root as its child)

@VadimMuhtarov
Copy link

Merge this before 1.5.9 please.

@ArranTuna
Copy link
Collaborator

If this just needs reviewing perhaps you should request someone to review so that such a beneficial change can be available to players as soon as possible.

@Pirulax
Copy link
Contributor Author

Pirulax commented Apr 20, 2021

Reviewing it is pretty trivial, testing it isn't.
There are a lot of edge cases that have to be tested, to make sure it works the same way as before.
The old behavior is kinda weird sometimes, but it has to be the same, because backwards compatibility.
I have have a few todos still, but they shouldn't be a blocker of testing.

@ArranTuna
Copy link
Collaborator

ArranTuna commented Apr 20, 2021

I just tried starting a server with it, using CIT scripts and after starting some of the scripts (during that massive spam of add event debug) it crashes and has crashed both times with the same last message in log being

[2021-04-20 17:36:35] startResource: Resource 'CITaccountsClient' started
Actually this is the last script that starts so must be something after script starting that crashes.

Here is the public dump file that was created:

server_1.5.8-custom_KERNELBASE_00122802_7363_20210420_1735.rsa.zip

@Pirulax
Copy link
Contributor Author

Pirulax commented Apr 21, 2021

@ArranTuna Could you please send me the private dump in private please? Over at discord (Pirulax#6835).
I can't open public dumps sadly.

@PlatinMTA
Copy link
Contributor

PlatinMTA commented May 14, 2021

Sup', how is this going? any ETA?

I had tested the changes with my gamemode and it worked just fine, tried to use the unit test but I could only get working the event-sys-test part, because setedata-times is missing.

In terms of memory usage it went up by 159MB~ on server-side (32 bits) and 220MB~ on client-side. My gamemode has to allocate a lot of memory for models so this could be a medium to worst case scenario, and tbh it shouldn't affect anything. I can't see a server/client crashing up because of this change, and even if that was the case the benefits are just too much.

The only way I can test that this really helps on a large scale is when this comes live tho, so I'll be waiting impatiently. Good work Pirulax!


Offtopic:

I thought this was a well known performance tip, but I don't see it mentioned on the addEventHandler wiki page. There is however multiple mentions of it in the page I made based on the things I learned about performance (some of which was from MTA devs) like ccw suggested changing all event handlers for client events to server from root to resourceRoot (you have to change the triggerServerEvent source to resourceRoot too) and doing that had a large CPU usage reduction for the server.

https://wiki.multitheftauto.com/wiki/Scripting_Tips

This means that this:

-- serverside
addEventHandler("my_event", resourceRoot, my_function)

--clientside
triggerServerEvent("my_event", resourceRoot)

Is faster than this?:

-- serverside
addEventHandler("my_event", root, my_function)

-- clientside
triggerServerEvent("my_event", localPlayer)

Or its kinda the same and just faster in general than root? If it is slower then I guess I would have to do some rewriting here and there.

@ArranTuna
Copy link
Collaborator

Yes that scripting tips thing is faster.

Did you test the speed of event handling on this build? Try with tens of thousands of elements created (Just for loop createObject). When I tested with Pirulax it turned out that this PR made triggerEvent take multiple times longer on a server with tens of thousands of elements.

@PlatinMTA
Copy link
Contributor

PlatinMTA commented May 14, 2021

Did you test the speed of event handling on this build? Try with tens of thousands of elements created (Just for loop createObject). When I tested with Pirulax it turned out that this PR made triggerEvent take multiple times longer on a server with tens of thousands of elements.

Just did some tests regarding this. I used r20740 to test, all serversided. Three tests in total, which where:

Test 1

Create and destroy 30k objects with an "onElementDestroy" event attached at root. You need to keep in mind that my server has over 8000 objects already created on serverside, so the total amount of objects is arround 38k, but only 30k are being created and destroyed.

addEventHandler("onElementDestroy", root, function()
	--
end)

local tick = getTickCount()
local objs = {}
for i=1,30000 do
	objs[i] = createObject(1337, 0, 0, 3)
end
print("created in:", getTickCount()-tick, "ms")

tick = getTickCount()
for i=1,30000 do
	destroyElement(objs[i])
end
print("destroyed in:", getTickCount()-tick, "ms")

= r20740 =
While nobody was connected: 93ms for creation, 94008ms for destroying
While I was connected: 416ms for creation, 230051ms for destroying

= Custom =
nobody: 300ms for creation, 98202ms for destroying
me: 310ms for creation, 101842 ms for destroying

While nobody is on the server it takes a tiny more to create and destroy the objects, but when somebody is connected you can expect a performance boost.

Test 2

Again created 30k objects, but this time i just trigger a event in other resource calling triggerEvent x100 times.

local tick = getTickCount()
local objs = {}
for i=1,30000 do
	objs[i] = createObject(1337, 0, 0, 3)
end
print("created in:", getTickCount()-tick, "ms")

local tick = getTickCount()
for i=1,100 do
	triggerEvent("onPlayerACInfo", resourceRoot, {})
end
print("100 triggerEvent:", getTickCount()-tick, "ms")

== r20740 ==
Nobody is connected: 93ms for creation, 43ms for triggering the event 100 times
I am connected: 197ms for creation, 109ms for triggering the event 100 times

== Custom ==
Nobody: 286ms for creation, 340ms for triggering the event 100 times
Me: 301ms for creation, 356ms for triggering the event 100 times

As you said, it takes longer for the events to trigger, which is odd.

Last Test

I just triggered all 100 events in a loop, remember that i have arround 8000 objects in serverside

local tick = getTickCount()
for i=1,100 do
	triggerEvent("onPlayerACInfo", resourceRoot, {})
end
print("100 triggerEvent:", getTickCount()-tick, "ms")

== r20740 ==
Nobody: Less than 1ms to call all events
Me: 1ms to call all events

== Custom ==
Nobody: Less than 1ms to call all events
Me: 1ms to call all events

No change at all.


My conclusion was that this issue only ocurred when the objects are created in the same resource where the triggers are being called. Making two separate resources, one that creates 30k objects and one that triggers the 100 events makes the result look like this:

== Custom ==
Nobody: Less than 1ms to call all events
Me: 1ms to call all events

Using a timer to call the events (1000ms after creation) didn't help with the results, so it's safe to say that this is resource related.

I hope this information is useful in some way.


btw i just tested this:

Yes that scripting tips thing is faster.

with 10k calls and sadly it doesn't seem to be the case, the localPlayer method and the resourceRoot method get mostly the same time, as expected, but the value can vary from just 300ms to over 2300ms, kinda strange. It's till 10x times better than root, obviously.

@Pirulax
Copy link
Contributor Author

Pirulax commented May 14, 2021

I honestly have no clue what causes the performance degradation Arran mentioned.
It's very weird.
Also, it's really hard to test this stuff, as there are a lot of stuff you have to watch out for.

Anyways. I'll look into it.
It's funny I can't a reason for it to be slower. Maybe its some MSVC stupidty. Would be nice to try with the linux builds as well.

@Pirulax Pirulax marked this pull request as draft May 14, 2021 15:20
@PlatinMTA
Copy link
Contributor

It's funny I can't a reason for it to be slower. Maybe its some MSVC stupidty. Would be nice to try with the linux builds as well.

I could try to compile it on linux and test it. It's a very edge case tho, shouldn't we be more flexible? Who will have a single resource with 30k objects anyway? Also why does it slow down anyway? In r20740 it goes from 1ms~ to 100ms~ just because the objects were created on the same resource. It's odd.

with 10k calls and sadly it doesn't seem to be the case, the localPlayer method and the resourceRoot method get mostly the same time, as expected, but the value can vary from just 300ms to over 2300ms, kinda strange. It's till 10x times better than root, obviously.

There is also this problem, every time I executed that test the results varied, but normally they went down by the same ammount. First the triggerings took 2300ms, then 2000, then 1500, and so on until 300-350ms~. I can't think of a reason of why that was happening, some caching maybe? I was using r20740 and not the custom version tho.

Eitherway if I can help in something (i'll try to test on linux when I can) let me know. Probably I'll do some more testings later, no promises tho.

@Pirulax
Copy link
Contributor Author

Pirulax commented May 15, 2021

I mean the compiler might not be inlining something as expected. I'll try dropping a __forceinline.
Honestly, theres no reason for it to be any slower, other than the compiler being stupid.
Also, you can just use the linux server, as its compiled with gcc.

@ImSkully
Copy link

@Pirulax Is there any further updates on the progress of this?

@Pirulax
Copy link
Contributor Author

Pirulax commented Sep 25, 2021

Not really. I'm yet to test it in a realistic way.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2022

This draft pull request is stale because it has been open for at least 90 days with no activity. Please continue on your draft pull request or it will be closed in 30 days automatically.

@github-actions github-actions bot added the stale Inactive for over 90 days, to be closed label Jan 7, 2022
@github-actions github-actions bot closed this Feb 7, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Feb 7, 2022

This draft pull request was closed because it has been marked stale for 30 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactor stale Inactive for over 90 days, to be closed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants