Difference between revisions of "OOS (Out of Sync)"
| Pentarctagon (talk | contribs)  (→Additional tricks and tips) | m (→Game becomes out of sync:  typo fixes) | ||
| (53 intermediate revisions by 9 users not shown) | |||
| Line 1: | Line 1: | ||
| − | + | OOS is an error that occurs in a Wesnoth multiplayer scenario or a replay when two machines disagree (or one machine disagrees with a replay file) about the game state, i.e. which units are where, how much HP they have, how much gold each side has, etc. More precisely, OOS is announced when a client reads a command, such as "move Dwarf Fighter 8,4 -> 8,5 -> 8,6" or "recruit Gryphon Rider 9,7", which is illegal or nonsensical based on its understanding of the game state. | |
| − | =  | + | OOS could in principle be caused by network issues such as dropped packets, by players having mismatched game files, or even by cheating, but it is commonly caused by scenario designers / add-on makers writing unsafe code. This page is about how to write WML that won't cause OOS. If you want to know what you should do if you get OOS in an online game, see [[http://forums.wesnoth.org/viewtopic.php?f=6&t=38881&p=554259 here]]. | 
| − | OOS  | + | == Why does OOS happen? == | 
| − | + | === Game begins out of sync === | |
| − | + | At the beginning of an mp game the host sends the content of the [multiplayer] or [scenario] to the other clients (via the mp server). All other information, specially [unit]s [terrain]s and toplevel [lua] tags are not transmitted to other players. Instead it is expected that the other players have this data locally available and that their data matches the data of the host. If that is not the case it can cause OOS. | |
| − | |||
| This can happen if one of the players has modified their game files, or if the players don't have matching add-on versions. | This can happen if one of the players has modified their game files, or if the players don't have matching add-on versions. | ||
| Line 15: | Line 14: | ||
| === Game becomes out of sync === | === Game becomes out of sync === | ||
| − | During  | + | During the actual game all clients have their own local gamestate. The Clients communicate by sending 'player actions' like 'move unit at (3,4) to (5,5) with the route ((3,4),(3,5),(4,5),(5,5))'. These actions are evaluated on the other clients which should lead to the same local gamestate on all clients (assuming that they had the same local gamestate before that action). If at any time clients don't have the same local gamestate we call that OOS. The clients usually send these actions as soon as they know that they cannot be undone. Those actions that are sent to the other clients are called "synced user actions". For example move,recall,recuit ... are such actions. There are also unsynced user action like "select" that only happen on one client and are not sent to other clients. The consequence is that "select" events only run on one client, thus changing the gamestate from inside a select event results in OOS. | 
| + | |||
| + | A complete list of unsynchronized events can be found here: [[EventWML#Multiplayer_safety]]. | ||
| + | |||
| + | The replay works the same way:  | ||
| − | + | the replay begins with the start gamestate and then evaluates all the synced user actions it can find on the replay. If the resulted gamestate is different than the original gamestate we have an OOS. That's why in most times replay safety is the same as multiplayer safety. | |
| − | + | Generally speaking a replay file receives the same kind of information about the game that a multiplayer observer of a game would receive, so any technique which would cause OOS for multiplayer will also cause corrupted replays. | |
| − | + | If wml/lua code is invoked by a synced user action and thus runs on all clients we say "the code runs in a synced context" otherwise not.  | |
| + | {{DevFeature1.13|0}} You can know whether the current code runs in a synced context by checking wesnoth.current.synced_state. | ||
| − | + | In order to get the same local gamestate on all clients you should only make the gamestate depend on deterministic functions that return the same value on all clients, for example side.controller s getter or math.rand is no such function. If you want to call these functions you should use wesnoth.synchronize_choice which allows you to run code on one client and then return the result to all clients so all clients are guaranteed to get the same result. But note that because the code in synchronize_choice only runs one one client it should only calculate the return value and not change the gamestate. | |
| − | ==  | + | === The rng === | 
| + | Wesnoth has a synced random number generator that returns the same value on all clients, this rng is uses by   | ||
| + | * [set_variable] rand=.. [/set_variable] | ||
| + | * traits & name generation in [unit] | ||
| + | * calculation whether an attack hits | ||
| + | To keep the rng in sync it is important to call the rng the same number of times on all clients. Othwewise the random results will be of by one which causes OOS. Luckily the synced rand is smart enough to know whether we are in a synced context. An it redirects to an unsynced rng (like math.random) if it was called from outside a synced context so that calling this rng from outside teh synced context doesnt have an effect on the rng used in the synced context. | ||
| + | |||
| + | This rng gets reseeded at the beginning of every "synced user action", the reason is, that otherwise players could know which random results they'll get next. Expecialy im MP this is very bad since it means players could know whether an attack will hit or not before they command it. In netwroked Mp the Client get their random seeds from the server. This happens layzly that means it doesn't happen at the beginning of the synced user action, but it happens as soon as the rng is used inside the synced context for this action (Especialy it doesn't happen if the rng isn't used). Since sending the "generate seed" to the server cannot be undone, an action that invoked the synced rng (for example attacks, but also moves with a [set_variable] rand=.. in a moveto event) cannot be undone. This is usually the intended behaviour since calling the rng usually implies a information gain that shouldn't be able to be undone. If you want a random number but don't want to make the gamestate depend on it (maybe just use it for visual features), and thus want the action to be undoable you should use the unsynced math.random from lua. | ||
| + | |||
| + | === Found dependent command === | ||
| + | |||
| + | It's possible to get a OOS eror with the message "Found (in-)dependent command" or similar here is explained what this means: During the game the clients send each other packages ([command] see [[ReplayWML]]). There are 2 types of these packages: the 'normal' commands that contains new user actions like attack, recruit, recall, move...  . And there are the 'dependent' packages that contains answer to questions asked to specific clients (the results of local choices). For example advancement choices. But also new random seeds (questions asked to the server) and get_global_variable are in this category. If a client received an answer to a local choice when none was expected the games gives a "Found dependent command when is_synced=false" OOS error message. The opposite situation when a client expects an answer to a local choice but receives a new user action also gives a OOs message. | ||
| + | |||
| + | == Some examples == | ||
| If Player A changes the White Mage's magic attack from 9-3 to 10-3 while Player B did not, that would result in an OOS since as the game progresses each client would see a different amount of damage being done. | If Player A changes the White Mage's magic attack from 9-3 to 10-3 while Player B did not, that would result in an OOS since as the game progresses each client would see a different amount of damage being done. | ||
| Line 31: | Line 48: | ||
| Say that the White Mage attacks a Spearman with 30 HP left and hits all 3 times.  To Player A, it would appear that the Spearman died.  However Player B would see that unit as having 3 HP left.  As a result... | Say that the White Mage attacks a Spearman with 30 HP left and hits all 3 times.  To Player A, it would appear that the Spearman died.  However Player B would see that unit as having 3 HP left.  As a result... | ||
| − | + | * What if Player A then tried to move another unit to where the Spearman was?  To Player A's client, it would work fine.  To Player B's client, that wouldn't make any sense, since it's not possible to have multiple units on the same hex. | |
| − | + | * What if Player B tried to attack one of Player A's units?  To Player B's client it seems like a normal move, but to Player A's client it seems like an empty hex is trying to attack him. | |
| − | ==  | + | == Hints To Debug OOS Errors == | 
| − | + | As noted before, OOS happens when two clients disagree about the current gamestate. So if in a game with clients A, B and C, client A got an OOS during the turn of client B it (often, not always) indicates that client A and B disagree about the gamestate, in this situation client C can often be used to decide which client did the wrong calculations; if client C also got an oos error message during the turn of client B, it's probably client B that did the wrong calculations, otherwise it's probably side A. The most common sources of OOS are: | |
| − | + | 1. Bad add-ons: in particular you need to be aware of the fact that addons can change the gamestate and cause OOS even if they are not currently active (for example as a modification or a mp scenario ), add-ons can cause OOS by just having them installed even if they seem to be completely unrelated to the current game. Less likely to happen on recent {{DevFeature1.15|4}} versions of the engine, as inactive addons are ignored. | |
| − | = | + | 2. User changing the game/addons files : it often unclear to users which part of the data can be changed and which can not be changes without causing OOS. For example some mp scenarios include lua code via macro inclusion ''code={myluafile.lua}'' while others include lua code via dofile/require ''wesnoth.dofile("myluafile.lua")'', in the former case it is usually safe to change the lua files because the hosts sends the other client his version of the lua file along with the other scenario content, while in the latter case it will usually lead to OOS. | 
| − | + | 3. Outdated wesnoth versions. While we try to keep the wesnoth versions mp compatible, it is still important to keep your wesnoth version up to date. In particular to avoid OOS errors caused by engine bugs that were fixed in newer versions, you can use the mp server command ''/q version <playername>'' to figure out what wesnoth version another player is using. | |
| − | + | 4. cheating players: not much to say, player can intentionally change wesnoth game files in a stupid way hoping it would give them an advantage when it will actually just cause OOS. | |
| + | |||
| + | 5. Engine bugs: Wesnoth is not always bugfree, the main difficulty in investigating engine OOS errors is that most of the OOS errors reported actually come from one of the other possibilities above, so it's somehow hard to find the 'real' oos engine bugs in the wild. Some features in particular 'delayed shroud updates' and 'multiplayer campaigns' have often caused OOS in the past though and it's possible they they will be broken again in the future due to their complex nature. | ||
| == How to make your code safe == | == How to make your code safe == | ||
| − | + | * Make the WML run at a synchronized time instead, i.e. in a moveto event. | |
| − | + | * Use helper.rand instead of math.random, unless you know exactly what you are doing. | |
| − | #  | + | * Use Lua wesnoth.synchronize_choice when gathering informaton to make sure that all clients match. Note that this function only works correct in syncronized events. | 
| + | |||
| + | === List of Non Mp/Replay safe Wml/Lua functions === | ||
| + | These functions/values might return different values on different clients or in replays. To prevent OOS you must use wesnoth.synchronize_choice or {{DevFeature1.13|0}} [sync_variable] to query these values when you want to change the gamestate depending on these values. | ||
| + | * Using functions that depend on other installed addons. For example, ''#wesnoth.unit_types'' will usually be different on each client. | ||
| + | * Using translatable strings for gamestate calculations; obviously the value of these strings depends on each client's language setting. | ||
| + | * Unsafe events, see [[EventWML#Multiplayer_safety|here]] for more detail | ||
| + | * Unit attributes, accessible via stored units, lua proxy units, or unit filters (via <tt>[filter_wml]</tt>) | ||
| + | ** unit.goto_x and unit.goto_y (used by the ai and by multi turn moves internally) | ||
| + | ** unit.facing (in some cases like when unstoring units the unit facing might be set randomly) | ||
| + | ** unit.name (it is possible for players to rename units, also in mp the leaders names changes whenever a side controller changes, also see the point below about any attribute changing the visual appearance.) | ||
| + | ** any attribute describing the visual appearance of that unit (unit.overlays, unit.profile etc.). In default wesnoth they may be be the same on all clients, but people usually assume that they can change the visuals of wesnoth by modifying the cfg files or via [modification]s in add-ons without causing OOS. | ||
| + | * Filters that cause side-effects, especially when used in unit abilities and weapon specials. | ||
| + | ** Clearly a filter should not change the gamestate in a unsynced context. | ||
| + | ** In the past the order in which filters are evaluated has been changed even during stable versions, so it is possible that even in a synced context a filter might be skipped on some clients only. In conclusion it is better to not rely on the order in which filters are evaluated or test in each wesnoth version. In particular: | ||
| + | *** The <tt>lua_function</tt> key calls an arbitrary Lua function. If the function called causes ''any'' change to the gamestate, you could get an out-of-sync error. This includes simply generating a random number, even using synchronized functions. | ||
| + | *** The <tt>formula</tt> is safe as long as you do not use the dice operator in the formula. | ||
| + | * Variables/wml tags | ||
| + | ** side.controller (gained by [store_side] or wesnoth.sides) – this variable will be different for each client. An exception is when controller is "null" which happens if and only if it is "null" on all other clients as well. | ||
| + | ** [set_variable] time=stamp - obviously the result of this operation will be different for all clients | ||
| + | * Lua functions: | ||
| + | ** math.random() | ||
| + | ** wesnoth.game_config.debug, .version have (possibly) different values | ||
| + | ** any dialog which queries input from a client (eg wesnoth.show_dialog) | ||
| + | ** wesnoth.sides[i].controller (same as with [store_side]), wesnoth.sides[i].is_local | ||
| + | ** any non mp-safe wml tag called by lua | ||
| == Additional tricks and tips == | == Additional tricks and tips == | ||
| − | + | * You can modify some things.  So while you shouldn't change the damage of a unit's attack, there's nothing wrong with changing it's portrait. | |
| − | |||
| == See Also == | == See Also == | ||
| − | [[ | + | [[MultiplayerContent]] | 
| + | |||
| + | [[Category:WML_Tips]] | ||
Latest revision as of 16:50, 11 October 2024
OOS is an error that occurs in a Wesnoth multiplayer scenario or a replay when two machines disagree (or one machine disagrees with a replay file) about the game state, i.e. which units are where, how much HP they have, how much gold each side has, etc. More precisely, OOS is announced when a client reads a command, such as "move Dwarf Fighter 8,4 -> 8,5 -> 8,6" or "recruit Gryphon Rider 9,7", which is illegal or nonsensical based on its understanding of the game state.
OOS could in principle be caused by network issues such as dropped packets, by players having mismatched game files, or even by cheating, but it is commonly caused by scenario designers / add-on makers writing unsafe code. This page is about how to write WML that won't cause OOS. If you want to know what you should do if you get OOS in an online game, see [here].
Contents
Why does OOS happen?
Game begins out of sync
At the beginning of an mp game the host sends the content of the [multiplayer] or [scenario] to the other clients (via the mp server). All other information, specially [unit]s [terrain]s and toplevel [lua] tags are not transmitted to other players. Instead it is expected that the other players have this data locally available and that their data matches the data of the host. If that is not the case it can cause OOS.
This can happen if one of the players has modified their game files, or if the players don't have matching add-on versions.
The OOS may still not appear until a mismatched resource actually appears and does something.
Game becomes out of sync
During the actual game all clients have their own local gamestate. The Clients communicate by sending 'player actions' like 'move unit at (3,4) to (5,5) with the route ((3,4),(3,5),(4,5),(5,5))'. These actions are evaluated on the other clients which should lead to the same local gamestate on all clients (assuming that they had the same local gamestate before that action). If at any time clients don't have the same local gamestate we call that OOS. The clients usually send these actions as soon as they know that they cannot be undone. Those actions that are sent to the other clients are called "synced user actions". For example move,recall,recuit ... are such actions. There are also unsynced user action like "select" that only happen on one client and are not sent to other clients. The consequence is that "select" events only run on one client, thus changing the gamestate from inside a select event results in OOS.
A complete list of unsynchronized events can be found here: EventWML#Multiplayer_safety.
The replay works the same way:
the replay begins with the start gamestate and then evaluates all the synced user actions it can find on the replay. If the resulted gamestate is different than the original gamestate we have an OOS. That's why in most times replay safety is the same as multiplayer safety.
Generally speaking a replay file receives the same kind of information about the game that a multiplayer observer of a game would receive, so any technique which would cause OOS for multiplayer will also cause corrupted replays.
If wml/lua code is invoked by a synced user action and thus runs on all clients we say "the code runs in a synced context" otherwise not. (Version 1.13.0 and later only) You can know whether the current code runs in a synced context by checking wesnoth.current.synced_state.
In order to get the same local gamestate on all clients you should only make the gamestate depend on deterministic functions that return the same value on all clients, for example side.controller s getter or math.rand is no such function. If you want to call these functions you should use wesnoth.synchronize_choice which allows you to run code on one client and then return the result to all clients so all clients are guaranteed to get the same result. But note that because the code in synchronize_choice only runs one one client it should only calculate the return value and not change the gamestate.
The rng
Wesnoth has a synced random number generator that returns the same value on all clients, this rng is uses by
- [set_variable] rand=.. [/set_variable]
- traits & name generation in [unit]
- calculation whether an attack hits
To keep the rng in sync it is important to call the rng the same number of times on all clients. Othwewise the random results will be of by one which causes OOS. Luckily the synced rand is smart enough to know whether we are in a synced context. An it redirects to an unsynced rng (like math.random) if it was called from outside a synced context so that calling this rng from outside teh synced context doesnt have an effect on the rng used in the synced context.
This rng gets reseeded at the beginning of every "synced user action", the reason is, that otherwise players could know which random results they'll get next. Expecialy im MP this is very bad since it means players could know whether an attack will hit or not before they command it. In netwroked Mp the Client get their random seeds from the server. This happens layzly that means it doesn't happen at the beginning of the synced user action, but it happens as soon as the rng is used inside the synced context for this action (Especialy it doesn't happen if the rng isn't used). Since sending the "generate seed" to the server cannot be undone, an action that invoked the synced rng (for example attacks, but also moves with a [set_variable] rand=.. in a moveto event) cannot be undone. This is usually the intended behaviour since calling the rng usually implies a information gain that shouldn't be able to be undone. If you want a random number but don't want to make the gamestate depend on it (maybe just use it for visual features), and thus want the action to be undoable you should use the unsynced math.random from lua.
Found dependent command
It's possible to get a OOS eror with the message "Found (in-)dependent command" or similar here is explained what this means: During the game the clients send each other packages ([command] see ReplayWML). There are 2 types of these packages: the 'normal' commands that contains new user actions like attack, recruit, recall, move... . And there are the 'dependent' packages that contains answer to questions asked to specific clients (the results of local choices). For example advancement choices. But also new random seeds (questions asked to the server) and get_global_variable are in this category. If a client received an answer to a local choice when none was expected the games gives a "Found dependent command when is_synced=false" OOS error message. The opposite situation when a client expects an answer to a local choice but receives a new user action also gives a OOs message.
Some examples
If Player A changes the White Mage's magic attack from 9-3 to 10-3 while Player B did not, that would result in an OOS since as the game progresses each client would see a different amount of damage being done.
Say that the White Mage attacks a Spearman with 30 HP left and hits all 3 times. To Player A, it would appear that the Spearman died. However Player B would see that unit as having 3 HP left. As a result...
- What if Player A then tried to move another unit to where the Spearman was? To Player A's client, it would work fine. To Player B's client, that wouldn't make any sense, since it's not possible to have multiple units on the same hex.
- What if Player B tried to attack one of Player A's units? To Player B's client it seems like a normal move, but to Player A's client it seems like an empty hex is trying to attack him.
Hints To Debug OOS Errors
As noted before, OOS happens when two clients disagree about the current gamestate. So if in a game with clients A, B and C, client A got an OOS during the turn of client B it (often, not always) indicates that client A and B disagree about the gamestate, in this situation client C can often be used to decide which client did the wrong calculations; if client C also got an oos error message during the turn of client B, it's probably client B that did the wrong calculations, otherwise it's probably side A. The most common sources of OOS are:
1. Bad add-ons: in particular you need to be aware of the fact that addons can change the gamestate and cause OOS even if they are not currently active (for example as a modification or a mp scenario ), add-ons can cause OOS by just having them installed even if they seem to be completely unrelated to the current game. Less likely to happen on recent (Version 1.15.4 and later only) versions of the engine, as inactive addons are ignored.
2. User changing the game/addons files : it often unclear to users which part of the data can be changed and which can not be changes without causing OOS. For example some mp scenarios include lua code via macro inclusion code={myluafile.lua} while others include lua code via dofile/require wesnoth.dofile("myluafile.lua"), in the former case it is usually safe to change the lua files because the hosts sends the other client his version of the lua file along with the other scenario content, while in the latter case it will usually lead to OOS.
3. Outdated wesnoth versions. While we try to keep the wesnoth versions mp compatible, it is still important to keep your wesnoth version up to date. In particular to avoid OOS errors caused by engine bugs that were fixed in newer versions, you can use the mp server command /q version <playername> to figure out what wesnoth version another player is using.
4. cheating players: not much to say, player can intentionally change wesnoth game files in a stupid way hoping it would give them an advantage when it will actually just cause OOS.
5. Engine bugs: Wesnoth is not always bugfree, the main difficulty in investigating engine OOS errors is that most of the OOS errors reported actually come from one of the other possibilities above, so it's somehow hard to find the 'real' oos engine bugs in the wild. Some features in particular 'delayed shroud updates' and 'multiplayer campaigns' have often caused OOS in the past though and it's possible they they will be broken again in the future due to their complex nature.
How to make your code safe
- Make the WML run at a synchronized time instead, i.e. in a moveto event.
- Use helper.rand instead of math.random, unless you know exactly what you are doing.
- Use Lua wesnoth.synchronize_choice when gathering informaton to make sure that all clients match. Note that this function only works correct in syncronized events.
List of Non Mp/Replay safe Wml/Lua functions
These functions/values might return different values on different clients or in replays. To prevent OOS you must use wesnoth.synchronize_choice or (Version 1.13.0 and later only) [sync_variable] to query these values when you want to change the gamestate depending on these values.
- Using functions that depend on other installed addons. For example, #wesnoth.unit_types will usually be different on each client.
- Using translatable strings for gamestate calculations; obviously the value of these strings depends on each client's language setting.
- Unsafe events, see here for more detail
- Unit attributes, accessible via stored units, lua proxy units, or unit filters (via [filter_wml])
- unit.goto_x and unit.goto_y (used by the ai and by multi turn moves internally)
- unit.facing (in some cases like when unstoring units the unit facing might be set randomly)
- unit.name (it is possible for players to rename units, also in mp the leaders names changes whenever a side controller changes, also see the point below about any attribute changing the visual appearance.)
- any attribute describing the visual appearance of that unit (unit.overlays, unit.profile etc.). In default wesnoth they may be be the same on all clients, but people usually assume that they can change the visuals of wesnoth by modifying the cfg files or via [modification]s in add-ons without causing OOS.
 
- Filters that cause side-effects, especially when used in unit abilities and weapon specials.
- Clearly a filter should not change the gamestate in a unsynced context.
- In the past the order in which filters are evaluated has been changed even during stable versions, so it is possible that even in a synced context a filter might be skipped on some clients only. In conclusion it is better to not rely on the order in which filters are evaluated or test in each wesnoth version. In particular:
- The lua_function key calls an arbitrary Lua function. If the function called causes any change to the gamestate, you could get an out-of-sync error. This includes simply generating a random number, even using synchronized functions.
- The formula is safe as long as you do not use the dice operator in the formula.
 
 
- Variables/wml tags
- side.controller (gained by [store_side] or wesnoth.sides) – this variable will be different for each client. An exception is when controller is "null" which happens if and only if it is "null" on all other clients as well.
- [set_variable] time=stamp - obviously the result of this operation will be different for all clients
 
- Lua functions:
- math.random()
- wesnoth.game_config.debug, .version have (possibly) different values
- any dialog which queries input from a client (eg wesnoth.show_dialog)
- wesnoth.sides[i].controller (same as with [store_side]), wesnoth.sides[i].is_local
- any non mp-safe wml tag called by lua
 
Additional tricks and tips
- You can modify some things. So while you shouldn't change the damage of a unit's attack, there's nothing wrong with changing it's portrait.