
MongoDB excels at exploring data without rigid schemas. If you can model your data in JSON, you can likely store and analyze it in MongoDB.
JSON, Python, and MongoDB
- JSON objects map neatly to Python dictionaries and MongoDB documents.
- Arrays map to Python lists and MongoDB arrays.
- Value types (strings, numbers, booleans, null) translate directly; MongoDB also supports extras like datetimes and regex.
Hierarchy overview:
| MongoDB | JSON | Python |
|---|---|---|
| Databases | Object | dict |
| Collections | Array | list |
| Documents | Object | dict |
| Subdocuments | Object | dict |
| Values | Scalars | Scalars (plus datetime/regex) |
Loading Sample Data (Nobel Prize API)
import requests
from pymongo import MongoClient
client = MongoClient()
db = client["nobel"]
for name in ["prizes", "laureates"]:
response = requests.get(f"http://api.nobelprize.org/v1/{name[:-1]}.json")
documents = response.json()[name]
db[name].insert_many(documents)
Accessing Collections
Use bracket or dot notation:
db = client["nobel"]
prizes = db["prizes"]
# or
db = client.nobel
prizes = db.prizes
Count documents:
prizes.count_documents({})
laureates.count_documents({})
Inspect a document:
laureates.find_one({})
Filtering
Simple match:
laureates.count_documents({"gender": "female"})
Compound filter:
criteria = {"gender": "female", "diedCountry": "France", "bornCity": "Warsaw"}
laureates.count_documents(criteria)
laureates.find_one(criteria)
Query operators ($in, $ne, $gt, $gte, $lt, $lte) refine searches:
laureates.count_documents({"diedCountry": {"$in": ["France", "USA"]}})
laureates.count_documents({"diedCountry": {"$ne": "France"}})
Dot notation drills into arrays/subdocuments:
laureates.count_documents({"prizes.affiliations.name": "University of California"})
Schema Flexibility
Documents in the same collection don’t need identical fields. Use $exists to test presence:
laureates.count_documents({"bornCountry": {"$exists": True}})
laureates.count_documents({"prizes.0": {"$exists": True}}) # at least one prize
laureates.count_documents({"prizes.1": {"$exists": True}}) # at least two prizes
Retrieve unique values:
laureates.distinct("gender")
# ['male', 'female', 'org']
Indexes speed up queries/aggregations, but for small datasets (≤MB, ≤1k docs) scans usually suffice.
MongoDB’s Python driver (pymongo) lets you fetch, filter, and explore semi-structured data with ease—perfect for APIs like the Nobel Prize dataset that provide JSON out of the box.