Vector: Query vector

A query vector is a tokenized version of a query sequence for a given timeframe. This method is still work in progress. Focus has been on obfuscating the sequences to avoid re-identification of the client, which will limit the effectiveness of the analysis. The token space is deliberately small and tokens are generated by a hash function where collisions are to be expected. Any new tokens found are submitted with the vectors.

Data

# Name Type Required Comment
1 start_time Timestamp yes Starting point for vector</tr> </tr>
2 duration Integer yes Vector length in seconds</tr> </tr>
3 vectors list<Bytestring> yes Vectors for all clients for the given time window. The vectors consist of tokens that are 32 bit long hashes of the word they represent</tr> </tr>
4 Wordlist delta list<Bytestring> yes Wordlist for all tokens not on the default list, ie the list of new words</tr> </tr> </table>