Introduction to MongoDB in Python

This article was translated using AI.

MongoDB excels at exploring data without rigid schemas. If you can model your data in JSON, you can likely store and analyze it in MongoDB.

JSON, Python, and MongoDB

JSON objects map neatly to Python dictionaries and MongoDB documents.
Arrays map to Python lists and MongoDB arrays.
Value types (strings, numbers, booleans, null) translate directly; MongoDB also supports extras like datetimes and regex.

Hierarchy overview:

MongoDB	JSON	Python
Databases	Object	dict
Collections	Array	list
Documents	Object	dict
Subdocuments	Object	dict
Values	Scalars	Scalars (plus datetime/regex)

Loading Sample Data (Nobel Prize API)

import requests
from pymongo import MongoClient

client = MongoClient()
db = client["nobel"]

for name in ["prizes", "laureates"]:
    response = requests.get(f"http://api.nobelprize.org/v1/{name[:-1]}.json")
    documents = response.json()[name]
    db[name].insert_many(documents)

Accessing Collections

Use bracket or dot notation:

db = client["nobel"]
prizes = db["prizes"]
# or
db = client.nobel
prizes = db.prizes

Count documents:

prizes.count_documents({})
laureates.count_documents({})

Inspect a document:

laureates.find_one({})

Filtering

Simple match:

laureates.count_documents({"gender": "female"})

Compound filter:

criteria = {"gender": "female", "diedCountry": "France", "bornCity": "Warsaw"}
laureates.count_documents(criteria)
laureates.find_one(criteria)

Query operators ($in, $ne, $gt, $gte, $lt, $lte) refine searches:

laureates.count_documents({"diedCountry": {"$in": ["France", "USA"]}})
laureates.count_documents({"diedCountry": {"$ne": "France"}})

Dot notation drills into arrays/subdocuments:

laureates.count_documents({"prizes.affiliations.name": "University of California"})

Schema Flexibility

Documents in the same collection don’t need identical fields. Use $exists to test presence:

laureates.count_documents({"bornCountry": {"$exists": True}})
laureates.count_documents({"prizes.0": {"$exists": True}})  # at least one prize
laureates.count_documents({"prizes.1": {"$exists": True}})  # at least two prizes

Retrieve unique values:

laureates.distinct("gender")
# ['male', 'female', 'org']

Indexes speed up queries/aggregations, but for small datasets (≤MB, ≤1k docs) scans usually suffice.

MongoDB’s Python driver (pymongo) lets you fetch, filter, and explore semi-structured data with ease—perfect for APIs like the Nobel Prize dataset that provide JSON out of the box.