Search guide
This guide explains how to write advanced search queries using easy to understand examples. Click on the examples to see the queries' results. You can also use these examples as bases for your queries, just change the input parameters on the search box after running them.
Example:temperature evaporation
Results will match records with the terms temperature
or evaporation
in any field. Note that stemming
is applied so e.g. index
will also match
indexes
. Search results are ranked according to an algorithm
that takes your query terms into account.
You can require presence of both terms using either the
+
or AND
operator:
Examples:
+temperature +evaporation
or
temperature AND evaporation
You can require absence of one or more terms using either the
-
or NOT
operator, for example if you want `temperature` data but excluding BARRA related results:
Examples:
-BARRA +temperature
or
NOT BARRA AND temperature
Phrase search
Example:
"land surface"
Results will match records with the phrase
land surface
in any field.
Field search
Example:
metadata.title:precipitation
Results will match records with the term precipitation
in the
field metadata.title
. If you want to search for multiple terms
in the title you must group the terms using parenthesis:
Example:
metadata.title:(precipitation rainfall)
Example:
metadata.creators.person_or_org.name: "Paola"
This will return all records that have authors whose name contains "Paola".
See the field reference below for the full list of fields you can search.
Combined simple, phrase or field search
Example:
+metadata.title:"land surface" -metadata.title:monthly
You can combine simple, phrase and field search to construct advanced search queries.
Dates and temporal range search
Example:metadata.publication_date:[2020 TO 2021-06]
(note, you must capitalize TO
).
Results will match any record with a publication_date between 2020-01-01 and 2021-06-01 (both dates inclusive).
Note that, partial dates are expanded to full dates, e.g.:
- 2020 is expanded to 2020-01-01
- 2021-06 is expanded to 2021-06-01
Use square brackets ([]
) for inclusive ranges and use
curly brackets ({}
) for exclusive ranges, e.g.:
-
[1970 TO 1980}
is equivalent to[1970-01-01 TO 1979-12-31]
because of date expansion and exclusive upper bound.
Selecting records including data in a given temporal range
The portal uses a combination of from/to
date_type to indicate the temporal range covered by a dataset.
To select the records that cover the entire period between two dates, for example from 1870 to 1900, coverage needs to satisfy 2 conditions:
- date of type
from-date
containing a date ≤ 1870 - date of type
to-date
containing a date ≥ 1900
Example:
-
(metadata.dates.date:{* TO 1870] AND metadata.dates.type.id:from-date) AND (metadata.dates.date:[1900 TO *} AND metadata.dates.type.id:to-date)
. All days until 1870 for from-date and all days from 1900 for to-date.>
To select all records that have some data in the selected period, the interval boundaries are swapped:
- date of type
from-date
containing a date ≤ 1900 - date of type
to-date
containing a date ≥ 1870
The results include CMIP and derived datasets as their temporal coverage starts at 1850
-
(metadata.dates.date:{* TO 1900] AND metadata.dates.type.id:from-date) AND (metadata.dates.date:[1870 TO *} AND metadata.dates.type.id:to-date)
. All days until 1900 for from-date and all days from 1870 for to-date.
The results now include the GPCC datasets which have data starting from 1891
These expressions are quite long, we are working on a template that will run the same queries just given a few arguments
Ranking/Sorting
By default all searches are sorted according to an internal ranking algorithm that scores each match against your query. In both the user interface and REST API, it's possible to sort the results by:
- Most recent
- Best match
Regular expressions
Regular expressions are a powerful pattern matching language that allow to search for specific patterns in a field. For instance if we wanted to find all records with a DOI-prefix 10.5281/zenodo we could use a regular expression search:
Example:
metadata.pids.doi.identifier10\.5281\/zenodo\*
Careful, the regular expression must match the entire field value. See the entire field value. See the regular expression syntax for further details.
Missing values
It is possible to search for records that either are missing a value or have
a value in a specific field using the _exists_
and
_missing_
field names.
Example:
_missing_:metadata.additional_titles
(all records without metadata.additional_titles)
Example:
_exists_:metadata.creators
(all records with metadata.creators)
Advanced concepts
Boosting
You can use the boost operator ^
when one term is more relevant
than another. For instance, you can search for all records with the phrase
temperature in either title or
description field, but rank records with the phrase in the
title field higher:
Example:
metadata.title:"temperature"^5 metadata.description:"temperature"
Fuzziness
You can search for terms similar to but not exactly like your search term
using the fuzzy operator ~
.
Example:
color~
Results will match records with terms similar to color
which
would e.g. also match colour
.
Proximity searches
A phrase search like "land surface"
by default expect all terms
in exactly the same order, and thus for instance would not match a record
containing the phrase "surface of the land". A proximity search
allows that the terms are not in the exact order and may include other terms
inbetween. The degree of flexiblity is specified by an integer afterwards:
Example:
"land surface"~5
Wildcards
You can use wildcards in search terms to replace a single character (using
?
operator) or zero or more characters (using
*
operator).
Example:
observation?
Without the ?
we will miss all the results with the term observations
Example:
observation
Wildcard searches can be slow and should normally be avoided if possible.
Query returns no matches even when expected!
This can happen because you used the wrong syntax, the query will say it couldn't find any matching records in such case rather than pointing out the issue.
Example:
metadata.titles=!WRF
metadata.titles=CLEF
Here I'm looking for `titles` that doesn't exists in the schema and instead of using `:` to find titles that don't contain `WRF` I'm using `=!` which is not a valid syntax.
If using booleans as AND / OR as long as one of the query has correct syntax and matches you will get some result even if the other side of the query is not getting evaluated because the syntax is wrong