Chapter 3: Working with Files and Data Serialization
Modern Python applications frequently read, write, and serialize data. From logs and configuration files to API responses and datasets, the ability to handle different file formats is crucial. In this chapter, you’ll learn how to:
- Read and write files using different modes (
r
,w
,a
,x
,b
) - Handle text, CSV, JSON, Pickle, and YAML formats
- Create and delete files and directories
- Implement a practical log parser that exports structured JSON
- Write testable file I/O logic using temp directories
3.1 File Modes, Creation, and Directory Operations
File Modes in Python
Mode | Description |
---|---|
'r' |
Read (default); file must exist |
'w' |
Write; overwrites or creates new file |
'a' |
Append to file |
'x' |
Create file; fails if it exists |
'b' |
Binary mode |
't' |
Text mode (default) |
Reading a Text File:
Let's say you have a text file with the following content:
This is a simple text file.
This is the second line of the text file.
This is the last line of the text file.
Here is a small python program that reads the content of the file into a variable and prints it on the console.
file_path = 'sample.txt'
try:
# File opened in read mode
with open(file_path, 'r') as file:
content = file.read()
print(content)
except FileNotFoundError:
print(f"File {file_path} not found.")
Writing to a Text File:
The following program demonstrates writing a string into a text file.
output_path = 'output.txt'
# File opened in write mode
with open(output_path, 'w') as file:
file.write("This is a new line.\nAnother line.")
Tip: Use
with open()
to ensure the file closes automatically.
Creating and Deleting Files
# Create
with open("temp.txt", "x") as f:
f.write("Sample text")
# Delete
import os
if os.path.exists("temp.txt"):
os.remove("temp.txt")
Working with Directories
import os
# Create directory
os.makedirs("logs", exist_ok=True)
# List directory
print(os.listdir("logs"))
# Remove empty directory
os.rmdir("logs")
Use os.path.join()
for safe cross-platform paths.
3.2 Working with Structured Formats: JSON, CSV, Pickle, YAML
JSON files
import json
data = {"name": "Alice", "skills": ["Python", "ML"]}
# Write
with open("user.json", "w") as f:
json.dump(data, f, indent=2)
# Read
with open("user.json", "r") as f:
print(json.load(f))
CSV files
import csv
rows = [["Name", "Age"], ["Alice", 30], ["Bob", 25]]
# Write CSV
with open("people.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(rows)
# Read CSV
with open("people.csv", "r") as f:
reader = csv.reader(f)
for row in reader:
print(row)
Pickle
Python's Pickle module serializes Python objects to binary.
import pickle
data = {"x": 1, "y": 2}
# Save binary
with open("data.pkl", "wb") as f:
pickle.dump(data, f)
# Load binary
with open("data.pkl", "rb") as f:
restored = pickle.load(f)
print(restored)
⚠️ Warning: Never unpickle untrusted data. It may execute arbitrary code.
YAML
YAML is human-readable and common in configuration files. Requires pyyaml
:
pip install pyyaml
Reading and writing YAML files.
import yaml
data = {"server": {"port": 8000, "debug": True}}
# Write YAML
with open("config.yaml", "w") as f:
yaml.dump(data, f)
# Read YAML
with open("config.yaml", "r") as f:
loaded = yaml.safe_load(f)
print(loaded)
3.3 Example: Log Parser Saving Output as JSON
Input: server.log
[INFO] Service started
[WARNING] Disk space low
[ERROR] Failed to connect
Parser: log_parser.py
import os
import json
def parse_log_file(input_path, output_path):
if not os.path.exists(input_path):
raise FileNotFoundError("Log file not found.")
entries = []
with open(input_path, "r") as f:
for line in f:
line = line.strip()
if line.startswith("[") and "]" in line:
level_end = line.index("]")
level = line[1:level_end]
message = line[level_end+1:].strip()
entries.append({"level": level, "message": message})
with open(output_path, "w") as out:
json.dump(entries, out, indent=2)
return entries
if __name__ == "__main__":
parse_log_file(sys.argv[1], sys.argv[2])
Run the Parser
python log_parser.py server.log server_log.json
Expected output (saved in server_log.json
):
[
{ "level": "INFO", "message": "Service started" },
{ "level": "WARNING", "message": "Disk space low" },
{ "level": "ERROR", "message": "Failed to connect" }
]
3.4 Testing File I/O with Temp Directories
Test file I/O safely with pytest
and tmp_path
.
File: test_log_parser.py
import json
from log_parser import parse_log_file
def test_log_parser(tmp_path):
log_file = tmp_path / "server.log"
output_file = tmp_path / "server.json"
log_file.write_text("[INFO] Test log\n[ERROR] Crash")
results = parse_log_file(log_file, output_file)
assert results[0]["level"] == "INFO"
assert results[1]["message"] == "Crash"
saved = json.loads(output_file.read_text())
assert saved == results
pytest
3.5 Python file/directory manipulation functions
File I/O – Text Files
Function / Method | Description | Example |
---|---|---|
open(file, mode='r') |
Opens a file in a given mode | open("file.txt", "w") |
file.read() |
Reads the entire content of a file | data = f.read() |
file.readline() |
Reads a single line | line = f.readline() |
file.readlines() |
Reads all lines into a list | lines = f.readlines() |
file.write(data) |
Writes a string to the file | f.write("Hello\n") |
file.writelines(lines) |
Writes a list of strings to a file | f.writelines(["a\n", "b\n"]) |
file.close() |
Closes the file (not needed with with open(...) as f: ) |
f.close() |
with open(...) |
Context manager for safe file handling | with open(...) as f: |
File I/O – Binary Files
Function / Method | Description | Example |
---|---|---|
open(file, 'rb') |
Opens file for reading in binary mode | with open("img.jpg", "rb") as f: |
open(file, 'wb') |
Opens file for writing in binary mode | with open("img.jpg", "wb") as f: |
file.read(size) |
Reads binary data (optional size) | data = f.read(1024) |
file.write(bytes) |
Writes binary data | f.write(b'\x00\x01') |
File & Directory Utilities (os
, shutil
, pathlib
)
Function / Method | Description | Example |
---|---|---|
os.path.exists(path) |
Check if path exists | os.path.exists("file.txt") |
os.remove(path) |
Delete a file | os.remove("file.txt") |
os.rename(src, dst) |
Rename file or directory | os.rename("a.txt", "b.txt") |
os.listdir(path) |
List directory contents | os.listdir(".") |
os.makedirs(path) |
Create directories recursively | os.makedirs("logs/errors") |
os.rmdir(path) |
Remove empty directory | os.rmdir("logs") |
shutil.rmtree(path) |
Remove non-empty directory | shutil.rmtree("logs") |
os.getcwd() |
Get current working directory | os.getcwd() |
os.chdir(path) |
Change current working directory | os.chdir("/tmp") |
Path Handling (os.path
, pathlib
)
Function / Method | Description | Example |
---|---|---|
os.path.join(a, b) |
Join paths safely | os.path.join("folder", "file.txt") |
os.path.basename(path) |
Get file name | os.path.basename("/x/y/z.txt") |
os.path.dirname(path) |
Get directory path | os.path.dirname("/x/y/z.txt") |
pathlib.Path(path).exists() |
Check path exists (modern alternative) | Path("file.txt").exists() |
pathlib.Path(path).unlink() |
Delete file (like os.remove ) |
Path("file.txt").unlink() |
pathlib.Path(path).mkdir() |
Create directory | Path("dir").mkdir(exist_ok=True) |
pathlib.Path(path).rmdir() |
Remove directory | Path("dir").rmdir() |
JSON Files (json
module)
Function | Description | Example |
---|---|---|
json.load(file) |
Parses JSON from a file object | data = json.load(open("file.json")) |
json.loads(string) |
Parses JSON from a string | json.loads('{"a":1}') |
json.dump(data, file) |
Writes JSON to file | json.dump(data, open("file.json", "w")) |
json.dumps(data) |
Converts Python object to JSON string | json.dumps({"a":1}) |
Pickle Module (pickle
) – Binary Object Serialization
Function / Method | Description | Example |
---|---|---|
pickle.dump(obj, file) |
Serialize and write an object to a binary file | pickle.dump(data, open("data.pkl", "wb")) |
pickle.load(file) |
Read and deserialize an object from a binary file | data = pickle.load(open("data.pkl", "rb")) |
pickle.dumps(obj) |
Serialize object to a bytes object | b = pickle.dumps(data) |
pickle.loads(bytes_obj) |
Deserialize object from bytes | data = pickle.loads(b) |
pickle.HIGHEST_PROTOCOL |
Constant for the most efficient (and recent) pickle format | pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL) |
Use Case: Best for Python-only data persistence, such as storing trained ML models or temporary cache structures. Warning: Pickle is not secure against untrusted sources.
YAML Module (PyYAML
) – Human-Readable Serialization
To use YAML in Python, install PyYAML first:
pip install pyyaml
Function / Method | Description | Example |
---|---|---|
yaml.dump(data, file) |
Write Python object to YAML file | yaml.dump(data, open("file.yaml", "w")) |
yaml.dump(data) |
Convert object to YAML string | yaml_string = yaml.dump(data) |
yaml.safe_dump(data) |
Safer version for basic Python objects | yaml.safe_dump(data, open("f.yaml", "w")) |
yaml.load(file, Loader) |
Read YAML file (can execute arbitrary code – unsafe) | data = yaml.load(f, Loader=yaml.FullLoader) |
yaml.safe_load(file) |
Safely read YAML content into Python object | data = yaml.safe_load(open("f.yaml")) |
yaml.safe_load_all(file) |
Load multiple YAML documents from a single file | docs = yaml.safe_load_all(open("f.yaml")) |
Use Case: Ideal for configuration files (e.g., Docker, Kubernetes, CI/CD pipelines) and human-editable data. Security Tip: Always prefer
safe_load()
overload()
when parsing YAML.
Summary
You’ve now mastered how to:
- Handle text, binary, and structured files (CSV, JSON, YAML, Pickle)
- Use proper file modes and directory operations
- Build and test a real-world file parser
What is Next?
In Chapter 4: Error Handling and Debugging, we’ll explore:
- Python’s built-in exception handling
- Logging strategies
- Using
pdb
and IDE debuggers - Enhancing robustness in real-world code