Only a few technical prerequisites are really mandatory for Metagrid to work at a basic level. However, we strongly encourage all projects interested in joining Metagrid to consider adopting the following best practices to take advantage of the features of the system.
Adopting the best practices will not only facilitate the integration of Metagrid into your project, but will also be beneficial for your project in a more general manner.
Apart from these technical prerequisites and best practices, you should also follow the Code of conduct.
Unique identifiers (mandatory)
Every entity (person, organization, place, …) in your project should have its own unique identifier. This can be either a number (numeric id) and/or text (text id). A unique identifier is supposed to be unique for a given type of ressource within its own context. This means that a unique identifier for a person can be used for one and only one person within your system. But the same unique identifier could be used for different kinds of entities.
Permalinks / Entry points (mandatory)
Every entity shall be made accessible using a permalink. This is the URL of the page you want your visitors to land on when seeking information about a given entity. Therefore a permalink is not just an unique identifier: it must also be a valid URL. It is this URL that will be displayed in the Metagrid widget. This is also the URL that will be used by the Metagrid crawler. The URL should be publicly accessible (see the Code of conduct)
Metagrid expects a permalink (i.e. http://dodis.ch/P1047) to be made of two parts: a fixed part or base part (i.e. http://dodis.ch/P) plus the unique identifier of the ressources (i.e. 1047).
The base part can vary based on the entity type and optionally on the language (if you provide different URLs for the same entity in different languages).
The unique identifier portion of the permalink does not need to be at the end of the permalink string, it can be anywhere.
You have to guarantee that a permalink will not change over time. That said, we know that URLs can change in the real world, for instance when you adopt a new software for your site. But the fact that we always consider a permalink to be composed of a base part plus a unique identifier makes the changing of either part in the future more possible. Changing the permalink base part for an entire Metagrid partner with ten of thousands of entities is usually just a matter of seconds. Changing the unique identifiers requires much more work, but is not impossible if we know the equivalences between the old and the new unique identifiers.
Sometimes, permalinks are implemented through HTTP redirections. This is supported by Metagrid.
Enrich your content
We strongly encourage you to semantically enrich your contents. This will not only increase the visibility of your data on the Internet, but it will also facilitate the collection and maintenance of data by the Metagrid crawler, and increase the chances of automatically finding matched proposals with other projects in Metagrid.
For tagging metadata, Metagrid currently supports out of the box a tiny subset of the schema.org vocabulary for persons (schema.org/Person) encoded as Microdata. We currently support the following properties: familyName, givenName, additionalNames, birthDate, deathDate. This list will be extendend in the future.
Provide a sitemap
Expose your ressources using an xml sitemap. The Metagrid crawler is capable of using it out of the box to navigate through your pages.
If you do not provide a sitemap, other strategies can be used for crawling your content, but they will require much more work.
Providing an xml sitemap for your site also increases your visibility on the internet, because this technology is recognised by all major search engines.
Exclude some entities from crawling
You can chose to exclude specific entities of your site from Metagrid. This is useful when you want to prevent some specific entity from being crawled, for instance because of privacy-related legal concerns.
In order to do this, simply insert the following code into those pages:
<meta name="robots" content="noindex">
The Metagrid crawler will respect this metatag and will not use the data of this page for suggesting automatic matches.