Sunday, June 21, 2009

The idea with building-in the runtime and persisted statistics into generic containers and containers specialized to types is that there is given support for a variety of adaptive algorithms. Now, in working with strings (which are particular containers of interest as variable length sequences over often an alphabet like UCS-2 Unicode) it is often the case when debugging that it is most convenient to use the standard strings of a runtime library like C++'s standard string or C's zero-terminated non-length prefixed string (C string). Yet, it may well be that for even small collections of strings that the buildup of a statistical profile representing the contents of the strings, in a hierarchical manner over the compositions of the strings in a heavyweight string, yields increases in the efficiency in algorithms over time and space computational resources. This can be particularly so for the collections of strings that represent fields over records in a mostly static dataset with very minimal modifications to the strings compared to sorts and re-sorts, where as well space requirements of indices for multiple orderings of records are not onerous.

Back then to a notion of the heavyweight string, there is a consideration about defining the statistical events that comprise the operations on the string, towards that there can be the collection of the statistical events. One notion then for a string is that there is to be a description of its contents, in terms of, for example a count of the letters in the alphabet, and then where there are built up pairs or squares in the sequences, there is to be some build-up of the alphabet. Particularly where the ranges of used characters are much less than the possible range of the alphabet (say over 16-bit characters), collected statistics of elements' properties give actionable information for the specialization of algorithms over strings, and as well offer minimal space encodings for compression of the string contents in memory.

No comments:

Post a Comment