With the help of Google Dataset search engine researchers, scientists, data journalists etc. access online data sets. Datasets discovery is made easy by adding the dataset schema and other metadata standards used for structuring the data of datasets. The main aim of this markup is to make datasets from fields like Sciences, social sciences, machine learning, civic and government data easily discoverable. You can search for Datasets using the Dataset Search Tool.
What Type Of Data Qualifies As Datasets?
The following types of data are qualified as Datasets:
How Do I Add The Data Set Markup?
You can add the dataset markup in the following ways:
If the page looks alright, you can ask Google to recrawl your URLs.
PRO TIP: If you want to delete your dataset or not want it to be displayed over the search engines, make use of the robots meta tag for controlling your dataset indexing process. However, it may take some time for the desired results.
Google’s Approach To Dataset Discovery:
Google Understands the structured data of datasets by using either the Schema.org Dataset Markup or equivalent structures represented in W3C’s Data Catalogue Vocabulary (DCAT) Format. For improving the discovery of datasets, Google is also experimenting with support for structured data on W3CCSVW.
What Are The Guidelines To Follow?
In addition to the structured data guidelines, Google advises to follow the:
A. Make use of sitemap files for helping Google to find your URL. Using the SameAs markup and Sitemap files helps Google document the process following which the dataset descriptions publish on your site.
B .A dataset repository usually has two types of pages: The Landing Page and the page listing multiple datasets.
In such cases, adding the Dataset Structure to Landing pages is recommended.
If structured data is added to multiple pages of the dataset, then Use the SameAs property to like it with the landing page.
If a dataset is a copy or best on another dataset, then follow the below-listed practices:
All textual properties should contain no more than 5ooo characters as Google Data Search makes use of only the first 5000 characters of any textual property. All Names and titles must either be of few words or short sentences.
WHAT TO DO IF MY STRUCTURED DATASETS EXPERIENCE ERRORS AND WARNINGS?
You might experience warnings or errors in Google’s Structured Data Testing Tool or other validation systems. These validation systems suggest that every organization should have contact information properties like ContactType; important values include customer service, emergency, journalist, newsroom and public engagement. You can ignore errors for csvw: Table that is not the expected value for the mainEntity property.
WHAT ARE THE VARIOUS PROPERTIES OF THE DATASET MARKUP?
The various properties required for structuring datasets data are:
A.Dataset: the property entails a detailed description of a particular topic. Example: Scientific or Civic datasets.
Entities such as an identifier, license and sameAs contain provenance and license information.
Guidelines to be followed:
Always provide a URL unambiguously stating the specific version of the license used.
Spatial coverage includes specifying the shape, location and points of coverage.
12.TemporalCoverage: The said time interval of the dataset specified in ISO 8601 format. Describe depending upon the dataset time interval. Example:
Single date: “temporalCoverage”: “2008”
Time period: “temporal coverage”: “1950-01-01”/ “2013-12-18”
Open-ended time period: “2013-12-19/….
DataCatalog
Data catalogues are usually published in repositories which contain many other datasets. Similar datasets are included in more than one such repository.
DataDownload:
WHAT ARE TABULAR DATASETS?
A tabular dataset is a dataset containing information organised in a grid of rows and columns. It is currently in beta form and is subjected to change. Use the Dataset Markup for structuring the data of tabular datasets. Currently, there is also a variation of CSVW provided on the HTML page parallel to user-oriented tabular content.
PRO TIP: Please refer to the previous posts of the Series to know in detail about monitoring search results and troubleshooting problems.
For analysing your Google Search Traffic, use the performance report.
WHAT DO I DO WHEN SPECIFIC DATASET IS NOT SHOWING UP IN DATASET SEARCH RESULTS?
What has caused the issue?
If you do not use the stated structured data markup on the page describing the dataset or if your website has not been crawled yet.
How can this issue be fixed?
Copy the page link and paste it in The Rich Results Test. If the message states that the page is not eligible for rich results or if the markup is not eligible for the rich result, then either the dataset markup is incorrect or there is no markup used. This can be fixed by following the how-to add data structure guideline.
If the page does not have a markup it means that it has not yet been crawled. You can check the crawl status with Google Search Console.
If the company logo is missing or it does not appear correctly by results
This problem usually occurs when your page misses the schema.org markup used to organise logos or your business is not established with Google.
How to fix this issue?
In the next post, I will throw light on how to structure Subscriptions and Paywalled Content onto your site.