A major design goal of org.apache.sis.feature is to reduce memory usage.
Consider a ShapeFile or a database table with millions of records.
Each record is represented by one Feature instance.
Sophisticated DataStore implementations will create and discard Feature
instances on the fly, but not all DataStore do that.
As a safety, Apache SIS tries to implement Feature in a way that allow applications
to scale higher before to die with an OutOfMemoryError.
A simple Feature implementation would use a java.util.HashMap as below:
class SimpleFeature {
final Map<String,Object> attributes = new HashMap<>(8);
}
The above SimpleFeature does not supports explicitly multi-valued properties and metadata
about the properties (admittedly multi-values could be stored as java.util.Collection,
but this approach has implications on the way we ensure type safety).
A more complete but still straightforward implementation could be:
class ComplexFeature {
final Map<String,Property> properties = new HashMap<>(8);
}
class Property {
final List<String> values = new ArrayList<>(4);
}
A more sophisticated implementation would take advantage of our knowledge that all records in a table have the
same attribute names, and that the vast majority of attributes are singleton.
Apache SIS uses this knowledge, together with lazy instantiations of Property.
The above simple implementation has been compared with the Apache SIS one in a micro-benchmark consisting of the
following steps:
Defines the following feature type:
| Attribute | Value class | |
|---|---|---|
city | : | String (8 characters) |
latitude | : | Float |
longitude | : | Float |
Launch the micro-benchmarks in Java with a fixed amount of memory. This micro-benchmarks used the following command line with Java 1.8.0_05 on MacOS X 10.7.5:
java -Xms100M -Xmx100M command
Creates Feature instances of the above type and store them in a list of fixed size
until we get OutOfMemoryError.
Count Time (seconds) Run mean σ mean σ ComplexFeature:194262 ± 2 21.8 ± 0.9 SimpleFeature:319426 ± 4 22.5 ± 0.6 SIS (mode 1): 639156 ± 40 25.6 ± 0.4 SIS (mode 2): 642437 ± 7 12.1 ± 0.8
For the trivial FeatureType used in this benchmark, the Apache SIS implementation can load
twice more Feature instances than the HashMap<String,Object>-based
implementation before the application get an OutOfMemoryError.
We presume that this is caused by the Map.Entry instances that HashMap must
create internally for each attribute.
Compared to ComplexFeature, SIS allows 3.3 times more instances while being functionally equivalent.
The speed comparisons are subject to more cautions, in part because each run has created a different amount
of instances before the test stopped. So even the slowest SIS case would be almost twice faster than
SimpleFeature because it created two times more instances in an equivalent amount of time.
However, this may be highly dependent on garbage collector activities (it has not been verified).