PYAPI-001
9
Example 2: Introduction to searching with the CSD Python API.
Aim
This example will focus on using the CSD Python API to carry out a search across the CSD. We will
create a search query, add criteria to the search query and then save the resulting hits from the query
as a refcode list (or .gcd file).
Searches of the CSD can be performed using the CSD Python API. There are a number of different
search modules including text numeric searching, substructure searching, similarity searching, and
reduced cell searching. In this example, we will be using the text numeric search module which
searches text and numeric data associated with individual entries in the CSD.
Unlike the similarity and substructure search modules, the text numeric search module can only be
used to search the CSD because it searches fields that are specific to the database.
Note: If you have not tried Example 1, you will need to do Steps 1-3 of that exercise before continuing
with this exercise to set up the command prompt.
Instructions
1. In the same folder as in Example 1, open your preferred text editor and create a new Python file
called ‘text_numeric_search.py’. The following steps show the code that you should write in your
Python file, along with explanations of what the code does.
2. First, we need to import the Text Numeric Search module in our script.
from ccdc.search import TextNumericSearch
3. We then need to create our search query. This line of code creates an empty query called ‘query’.
query = TextNumericSearch()
4. We are going to use our query to look for entries that have ‘ferrocene’ in their chemical names in
the CSD. To do this we need to define the search parameters to find entries which contain the
word ‘ferrocene’ anywhere in the chemical name and synonyms field.
query.add_compound_name('ferrocene')
5. To search the CSD we will use the .search() function which will produce a list of ‘hits’ that are
entries which have met the defined criteria. This has been assigned to variable hit_list to save
the output of the search.
hit_list = query.search()
6. To see how many entries have been found in our search, we will add a line to print the length of
the hit list.
print(f'Number of hits : {len(hit_list)}')
7. We are now ready to search the CSD. Save the changes you have made to your script and then run
the Python script in your command prompt. To run your Python script, type the following in your
command prompt and then press ‘Enter’: