If there's one lesson to be learned from issue #502, in my opinion, it's that the bindrule function was used naively, and that's because it was designed in an irrational way.
Allow me to preface this, but I agree with Bjarne Stroustrup on one thing: I don't like the semantics of the operator[] in standard library containers, specifically that when a key doesn't exist, an item is created in the container anyway, just to return a valid reference. This behavior, which was chosen for efficiency reasons, is often the source of problems that are difficult to detect.
b2 also uses the same paradigm (once very popular) in several functions, I'm referring to bindmodule, bindtarget, and bindrule. All these functions, when they don't find the requested object, create a new one and return it.
In b2, the interface for search only has not been implemented. I added the find_module and find_target functions in #559.
In code where only one query needs to be performed, the call to hash_find is hardwired to the relevant hash table (grep hash_find *.cpp | wc -l currently finds 18.)
The main problem with bindX functions isn't how they're written but how/where they're used.
All built-in rules are called by passing them data from user scripts, which can therefore create (more or less unintentionally) objects in the process's memory given the lack of any checks! It may seem like the usual problem of not trusting data coming from the user, but the issue is more subtle.
I won't list the dangerous rules, but as a demonstration, here's a script that will quickly run out of memory, assuming the operating system doesn't somehow limit the amount of memory a process can use. Try monitoring it while it runs.
#|
bomb.jam
use with
b2 -f bomb.jam
|#
local i = 0 ;
while forever
{
i = [ CALC $(i) + 1 ] ;
ECHO $(i) ;
# NOTE: creates a new module on each call regardless
# of whether the module exists, this will run out of memory
# unless limited in some way by the operating system
IMPORT $(i) ;
# NOTE: creates a new target on each call regardless
# of whether the target exists, this will run out of memory
# unless limited in some way by the operating system
NOCARE $(i) ;
}
Even more important than the vulnerability issue, which probably no one cares about, is the fact that this behavior of bindX can contribute to hiding design errors. It's not a given that an object that was expected to be found, when instead it is synthesized by bindX because it is missing, will then produce an error in subsequent processing, as fortunately happened in the case of issue #502.
Unfortunately, precisely because of the semantics of the functions, it's impossible to consider modifying them, for example, to issue a warning every time an object is not found, so as to check for unexpected cases that no one has noticed until now, which probably represent errors.
In fact, we must assume that all calls to bindX are legitimate, meaning that whoever wrote them knew what they were doing. For example, during jamfile parsing, when b2 is building the dependency graph, it's natural for the objects not to exist, and when they do exist, it's correct to recall them with a bindX. I've also seen calls to bindtarget that, during the subsequent update phase, look for a semaphore that doesn't necessarily exist and should be created on demand [1].
But in the general case, who can assure us that all the objects requested by bindX must be found? Or that if they aren't found, it's correct to synthesize new ones?
Nobody, and the validity of every single call must be verified punctually! This is the main drawback to using functions like bindX.
Conversely, using a find_X explicitly states the expectation regarding the expected results.
To eliminate the Jam vulnerability, suppose we now want to modify the builtin rule implementations to replace bindX calls with find_X calls and issue warnings when the requested objects are not found.
Let's take for example one of the many built-in rules, IMPORT.
Currently, if IMPORT is asked to import from/to a nonexistent module, it has no problem calling the bindmodule on all the named modules (silently creating all the nonexistent ones), but if any of the rules it is asked to import are not found, it considers this an error and terminates the script. Obviously, it would be more correct if the error were produced early due to the absence of the modules, but this could cause backwards compatibility issues, so we will have to make do with issuing a warning.
And even if we decided to only issue a warning when a module does not exist, we would have to limit ourselves to checking only the source module. This is because among the Jam code accumulated over time there are those who now count on the possibility of creating modules in this way, such as tools/types/register.jam which creates a module for each type registered in the same directory (currently 18) just with an IMPORT
# A loop over all modules in this directory
for m in $(.sibling-modules)
{
m = [ path.basename $(m) ] ;
m = types/$(m) ;
# Inject the type rule into the new module
IMPORT $(__name__) : type : $(m:B) : type ;
import $(m) ;
}
[1]
see the implementation of OPT_SEMAPHORE in make.cpp
If there's one lesson to be learned from issue #502, in my opinion, it's that the
bindrulefunction was used naively, and that's because it was designed in an irrational way.Allow me to preface this, but I agree with Bjarne Stroustrup on one thing: I don't like the semantics of the
operator[]in standard library containers, specifically that when a key doesn't exist, an item is created in the container anyway, just to return a valid reference. This behavior, which was chosen for efficiency reasons, is often the source of problems that are difficult to detect.b2 also uses the same paradigm (once very popular) in several functions, I'm referring to
bindmodule,bindtarget, andbindrule. All these functions, when they don't find the requested object, create a new one and return it.In b2, the interface for search only has not been implemented. I added the
find_moduleandfind_targetfunctions in #559.In code where only one query needs to be performed, the call to
hash_findis hardwired to the relevant hash table (grep hash_find *.cpp | wc -lcurrently finds 18.)The main problem with bindX functions isn't how they're written but how/where they're used.
All built-in rules are called by passing them data from user scripts, which can therefore create (more or less unintentionally) objects in the process's memory given the lack of any checks! It may seem like the usual problem of not trusting data coming from the user, but the issue is more subtle.
I won't list the dangerous rules, but as a demonstration, here's a script that will quickly run out of memory, assuming the operating system doesn't somehow limit the amount of memory a process can use. Try monitoring it while it runs.
Even more important than the vulnerability issue, which probably no one cares about, is the fact that this behavior of bindX can contribute to hiding design errors. It's not a given that an object that was expected to be found, when instead it is synthesized by bindX because it is missing, will then produce an error in subsequent processing, as fortunately happened in the case of issue #502.
Unfortunately, precisely because of the semantics of the functions, it's impossible to consider modifying them, for example, to issue a warning every time an object is not found, so as to check for unexpected cases that no one has noticed until now, which probably represent errors.
In fact, we must assume that all calls to bindX are legitimate, meaning that whoever wrote them knew what they were doing. For example, during jamfile parsing, when b2 is building the dependency graph, it's natural for the objects not to exist, and when they do exist, it's correct to recall them with a bindX. I've also seen calls to
bindtargetthat, during the subsequent update phase, look for a semaphore that doesn't necessarily exist and should be created on demand [1].But in the general case, who can assure us that all the objects requested by bindX must be found? Or that if they aren't found, it's correct to synthesize new ones?
Nobody, and the validity of every single call must be verified punctually! This is the main drawback to using functions like bindX.
Conversely, using a find_X explicitly states the expectation regarding the expected results.
To eliminate the Jam vulnerability, suppose we now want to modify the builtin rule implementations to replace bindX calls with find_X calls and issue warnings when the requested objects are not found.
Let's take for example one of the many built-in rules,
IMPORT.Currently, if
IMPORTis asked to import from/to a nonexistent module, it has no problem calling thebindmoduleon all the named modules (silently creating all the nonexistent ones), but if any of the rules it is asked to import are not found, it considers this an error and terminates the script. Obviously, it would be more correct if the error were produced early due to the absence of the modules, but this could cause backwards compatibility issues, so we will have to make do with issuing a warning.And even if we decided to only issue a warning when a module does not exist, we would have to limit ourselves to checking only the source module. This is because among the Jam code accumulated over time there are those who now count on the possibility of creating modules in this way, such as
tools/types/register.jamwhich creates a module for each type registered in the same directory (currently 18) just with anIMPORT[1]
see the implementation of
OPT_SEMAPHOREinmake.cpp