Sunday, June 21, 2009

Searching and Sorting

Here there is the notion to have the "statsort", or statistical sort. The idea there is that there is a first pass, or there are passes, over the data to generate a statistical profile of the data. Then, there is the consideration about the vairous orders how the data would be sorted, about how to determine where it is cheaper to maintain an array in its layout order than to modify it because its already in a particular order. Then, there are consideration about the fields of sorting, about how for a given array of records they might be sorted about a variety of the fields of the record,in terms of that the sort indices would then be generated.

So, the algorithm basically has in terms of generating a "sorting" as to whether the data is to be made ordered in its layout, in terms of transforming the layout, or instead as to whether the data is to have generated ancillary data structures, and then how accesses to those data structures have side effects or interactions in terms of the data.

So, for the statistical sort, there is a general notion that there is to be a kind of determination of the characteristics of the population that would affect sorting, in terms of whether particular tests of the ordering relation may be improvised or specialized.

Then that is about the container and data organization framework, in terms of that the algorithms would be affecting the contents of the containers, and then as to how the containers present various accesses to their contents, there are considerations as to how to go about it so that their contents are synchronized (with mutations to the container).

Then, there is to be something along the lines of that insertions and removals are events, so those are to be considered, and all of these other notions of reducing work by using knowledge.

About the "statistical properties" of the contents of the things, there are the various discrete data types along enumerated types, then as to whether those are ranged is another consideration. For things like strings and so on, there is a consideration tat in the lexicographical ordering there is something along the lines of the first letter and then for each level those things, then otherwise there is the consideration that each string should have a description of its own properties, for example having for strings of sufficient size a description of the contents of the string, eg its alphabet and so on.

Then, one of the ideas is to have for the small alphabets the completion arrays and so on, about the efficient and perhaps parallelizable in completion banks rapid computation of satisfiability.

Then, there are time and space considerations of the layout of the data, about how it is retrieved and so on, and then for each subset via a query or specification of the data, what it means to extract and layout the data.

Then, where there is lots of storage of collections of strings, with associations in metadata, there are considerations about how to start building up for a given corpus these things, then as well, how to make the information about the corpus separable, towards that summary statistics can be added and removed, towards making operations on the data that describe the contents being in a sense memoryless.

The idea here is to develop programmatic containers for various data types, then perhaps that should begin with the sequence or string. Then, where that is beyind the single datum of the scalar type, then there are considerations for storage of the fixed width and variable width types.

No comments:

Post a Comment