On the surface, the Metagrid-Plugin is just a simple browser extension. It doesn’t collect any personal data about its users. The only thing it does is display links between resources on the connected pages. These links are created using embeddings.
First, the ressource is just text. This text is broken down into tokens, which are smaller junks of text or even single words. Every Large Language Model (LLM) starts with this step. Every token is assigned a unique numerical identifier. These identifiers are then transformed into embeddings. Embeddings are vectors that carry semantic meaning.
Metagrid uses 1536 dimensional vector embeddings. Each dimension represents a specific feature of the original data, in our case text. Embeddings offer a mathematical representation of the semantic meaning, even independent from the original language they were generated from. Similar words, sentences, documents or articles will produce similar vectors, that point in a similar direction. The similarity can be measured, e.g. by euclidian distance or cosine similarity. This will result in a numerical representation of similarity between the embeddings.
Here’s a simple example: The words «dog» and «cat» have embeddings that are much closer in a 2-dimensional space than «dog» and «car».
This makes it possible for a computer to compare meaning.
The text of the dictionary entry is broken down into tokens. Every token gets assigned an emedding. In our case, this is a 1536 dimensional vector. All the embeddings get then combined into an average, also called a document embedding. This value is then stored in the Vector DB. Metagrid uses an API of an LLM for these steps. Currently, this is openAI ADA-03, but we are able to change the model easily. In the future, a specifically trained model may be implemented.
These steps (tokenisation – vector-mapping – document-embedding) happen every time new data is pulled into the Metagrid database or when data is updated.
The granularity of the text or document has a big impact on the document embedding. This may result in fewer links for texts on broader topics, and a higher quantity and quality of links on more specific entries.
This is a method to measure the similarity between two vectory in an n-dimensional space. It is used in text analysis, information retrieval or recommender systems. Cosine similarity is expecially useful for text represented as vectors.
Cosine similarity considers the angle between two vectors. If the angle is 0°, the vectors are parallel. This is the maximum similarity and results in a similarity value of 1. If the angle is 90°, the vectors stand orthogonal to each other and the similarity value is 0.
The Metagrid Collector regularly crawls the pages of the connected sites, looking for changes to ressources and new publicly available ressouces. The collector is only looking for published documents or articles. Our crawler works friendly, not overloading the data provider’s servers with requests. The collector then sends the data to the LLM API and requests the vector mappings. The resulting document embeddings are then stored in the vector database.
The Metagrid Plugin API server queries the vector database. When a user visits one of the connected pages, the plugin sends a request to the Metagrid Cluster. Only if a result is found, will the plugin activate in the users browser and display the links.
What happens when using the Metagrid Plugin?
The Plugin is currently available in the Chrome Web Store for all Chromium based browsers. It is a very light-weight browser extension. The plugin only displays its own contents on the websites of the network. It does not analyze the sites content and does not collect any personal data about its users.
The plugin only appears, when at least one link is found. It makes itself clearly visible as a foreign object to the website. This enables users to clearly distinguish between content provided by the sites they are visiting, and the additional information offered by the Metagrid-Plugin.
The plugin sends only the following information to the Metagrid Server: current url, user feedback («thumbs up»/«thumbs down», clicks on links), and an anonymous token of the plugin installation.
The plugin receives similar resources from the Metagrid server and provides those to the user. There is no contact from the plugin to any third-party services. Metagrid collects no personal user data.