
Chapter Outline
Chapter 4: Working with Files and Data Serialization
Modern Python applications frequently read, write, and serialize data. From logs and configuration files to API responses and datasets, the ability to handle different file formats is crucial. In this chapter, you’ll learn how to:
- Read and write files using different modes (
r,w,a,x,b) - Handle text, CSV, JSON, Pickle, and YAML formats
- Create and delete files and directories
- Implement a practical log parser that exports structured JSON
- Write testable file I/O logic using temp directories
4.1 File Modes, Creation, and Directory Operations
File Modes in Python
| Mode | Description |
|---|---|
'r' | Read (default); file must exist |
'w' | Write; overwrites or creates new file |
'a' | Append to file |
'x' | Create file; fails if it exists |
'b' | Binary mode |
't' | Text mode (default) |
Reading a Text File:
Let's say you have a text file with the following content:
bashThis is a simple text file.This is the second line of the text file.This is the last line of the text file.
Here is a small python program that reads the content of the file into a variable and prints it on the console.
python1file_path = 'sample.txt'23try:4 # File opened in read mode5 with open(file_path, 'r') as file:6 content = file.read()7 print(content)8except FileNotFoundError:9 print(f"File {file_path} not found.")
Writing to a Text File:
The following program demonstrates writing a string into a text file.
python1output_path = 'output.txt'23# File opened in write mode4with open(output_path, 'w') as file:5 file.write("This is a new line.\nAnother line.")
Use
with open()to ensure the file closes automatically.
Creating and Deleting Files
python1# Create2with open("temp.txt", "x") as f:3 f.write("Sample text")45# Delete6import os7if os.path.exists("temp.txt"):8 os.remove("temp.txt")
Working with Directories
python1import os23# Create directory4os.makedirs("logs", exist_ok=True)56# List directory7print(os.listdir("logs"))89# Remove empty directory10os.rmdir("logs")
Use os.path.join() for safe cross-platform paths.
4.2 Working with Structured Formats: JSON, CSV, Pickle, YAML
JSON files
python1import json23data = {"name": "Alice", "skills": ["Python", "ML"]}45# Write6with open("user.json", "w") as f:7 json.dump(data, f, indent=2)89# Read10with open("user.json", "r") as f:11 print(json.load(f))
CSV files
python1import csv23rows = [["Name", "Age"], ["Alice", 30], ["Bob", 25]]45# Write CSV6with open("people.csv", "w", newline="") as f:7 writer = csv.writer(f)8 writer.writerows(rows)910# Read CSV11with open("people.csv", "r") as f:12 reader = csv.reader(f)13 for row in reader:14 print(row)
Pickle
Python's Pickle module serializes Python objects to binary.
python1import pickle23data = {"x": 1, "y": 2}45# Save binary6with open("data.pkl", "wb") as f:7 pickle.dump(data, f)89# Load binary10with open("data.pkl", "rb") as f:11 restored = pickle.load(f)12 print(restored)
Never unpickle untrusted data. It may execute arbitrary code.
YAML
YAML is human-readable and common in configuration files. Requires pyyaml:
bashpip install pyyaml
Reading and writing YAML files.
pythonimport yamldata = {"server": {"port": 8000, "debug": True}}# Write YAMLwith open("config.yaml", "w") as f:yaml.dump(data, f)# Read YAMLwith open("config.yaml", "r") as f:loaded = yaml.safe_load(f)print(loaded)
4.3 Example: Log Parser Saving Output as JSON
Input: server.log
server.log1[INFO] Service started2[WARNING] Disk space low3[ERROR] Failed to connect
Parser: log_parser.py
log_parser.py1import os2import json34def parse_log_file(input_path, output_path):5 if not os.path.exists(input_path):6 raise FileNotFoundError("Log file not found.")78 entries = []9 with open(input_path, "r") as f:10 for line in f:11 line = line.strip()12 if line.startswith("[") and "]" in line:13 level_end = line.index("]")14 level = line[1:level_end]15 message = line[level_end+1:].strip()16 entries.append({"level": level, "message": message})1718 with open(output_path, "w") as out:19 json.dump(entries, out, indent=2)2021 return entries2223if __name__ == "__main__":24 parse_log_file(sys.argv[1], sys.argv[2])
Run the Parser
bashpython log_parser.py server.log server_log.json
Expected output (saved in server_log.json):
server_log.json1[2 { "level": "INFO", "message": "Service started" },3 { "level": "WARNING", "message": "Disk space low" },4 { "level": "ERROR", "message": "Failed to connect" }5]
4.4 Testing File I/O with Temp Directories
Test file I/O safely with pytest and tmp_path.
File: test_log_parser.py
test_log_parser.py1import json2from log_parser import parse_log_file34def test_log_parser(tmp_path):5 log_file = tmp_path / "server.log"6 output_file = tmp_path / "server.json"78 log_file.write_text("[INFO] Test log\n[ERROR] Crash")910 results = parse_log_file(log_file, output_file)1112 assert results[0]["level"] == "INFO"13 assert results[1]["message"] == "Crash"1415 saved = json.loads(output_file.read_text())16 assert saved == results
bashpytest
4.5 Python file/directory manipulation functions
File I/O – Text Files
| Function / Method | Description | Example |
|---|---|---|
open(file, mode='r') | Opens a file in a given mode | open("file.txt", "w") |
file.read() | Reads the entire content of a file | data = f.read() |
file.readline() | Reads a single line | line = f.readline() |
file.readlines() | Reads all lines into a list | lines = f.readlines() |
file.write(data) | Writes a string to the file | f.write("Hello\n") |
file.writelines(lines) | Writes a list of strings to a file | f.writelines(["a\n", "b\n"]) |
file.close() | Closes the file (not needed with with open(...) as f:) | f.close() |
with open(...) | Context manager for safe file handling | with open(...) as f: |
File I/O – Binary Files
| Function / Method | Description | Example |
|---|---|---|
open(file, 'rb') | Opens file for reading in binary mode | with open("img.jpg", "rb") as f: |
open(file, 'wb') | Opens file for writing in binary mode | with open("img.jpg", "wb") as f: |
file.read(size) | Reads binary data (optional size) | data = f.read(1024) |
file.write(bytes) | Writes binary data | f.write(b'\x00\x01') |
File & Directory Utilities (os, shutil, pathlib)
| Function / Method | Description | Example |
|---|---|---|
os.path.exists(path) | Check if path exists | os.path.exists("file.txt") |
os.remove(path) | Delete a file | os.remove("file.txt") |
os.rename(src, dst) | Rename file or directory | os.rename("a.txt", "b.txt") |
os.listdir(path) | List directory contents | os.listdir(".") |
os.makedirs(path) | Create directories recursively | os.makedirs("logs/errors") |
os.rmdir(path) | Remove empty directory | os.rmdir("logs") |
shutil.rmtree(path) | Remove non-empty directory | shutil.rmtree("logs") |
os.getcwd() | Get current working directory | os.getcwd() |
os.chdir(path) | Change current working directory | os.chdir("/tmp") |
Path Handling (os.path, pathlib)
| Function / Method | Description | Example |
|---|---|---|
os.path.join(a, b) | Join paths safely | os.path.join("folder", "file.txt") |
os.path.basename(path) | Get file name | os.path.basename("/x/y/z.txt") |
os.path.dirname(path) | Get directory path | os.path.dirname("/x/y/z.txt") |
pathlib.Path(path).exists() | Check path exists (modern alternative) | Path("file.txt").exists() |
pathlib.Path(path).unlink() | Delete file (like os.remove) | Path("file.txt").unlink() |
pathlib.Path(path).mkdir() | Create directory | Path("dir").mkdir(exist_ok=True) |
pathlib.Path(path).rmdir() | Remove directory | Path("dir").rmdir() |
JSON Files (json module)
| Function | Description | Example |
|---|---|---|
json.load(file) | Parses JSON from a file object | data = json.load(open("file.json")) |
json.loads(string) | Parses JSON from a string | json.loads('{"a":1}') |
json.dump(data, file) | Writes JSON to file | json.dump(data, open("file.json", "w")) |
json.dumps(data) | Converts Python object to JSON string | json.dumps({"a":1}) |
Pickle Module (pickle) – Binary Object Serialization
| Function / Method | Description | Example |
|---|---|---|
pickle.dump(obj, file) | Serialize and write an object to a binary file | pickle.dump(data, open("data.pkl", "wb")) |
pickle.load(file) | Read and deserialize an object from a binary file | data = pickle.load(open("data.pkl", "rb")) |
pickle.dumps(obj) | Serialize object to a bytes object | b = pickle.dumps(data) |
pickle.loads(bytes_obj) | Deserialize object from bytes | data = pickle.loads(b) |
pickle.HIGHEST_PROTOCOL | Constant for the most efficient (and recent) pickle format | pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL) |
Best for Python-only data persistence, such as storing trained ML models or temporary cache structures.
Pickle isnot secure against untrusted sources.
YAML Module (PyYAML) – Human-Readable Serialization
To use YAML in Python, install PyYAML first:
bashpip install pyyaml
| Function / Method | Description | Example |
|---|---|---|
yaml.dump(data, file) | Write Python object to YAML file | yaml.dump(data, open("file.yaml", "w")) |
yaml.dump(data) | Convert object to YAML string | yaml_string = yaml.dump(data) |
yaml.safe_dump(data) | Safer version for basic Python objects | yaml.safe_dump(data, open("f.yaml", "w")) |
yaml.load(file, Loader) | Read YAML file (can execute arbitrary code – unsafe) | data = yaml.load(f, Loader=yaml.FullLoader) |
yaml.safe_load(file) | Safely read YAML content into Python object | data = yaml.safe_load(open("f.yaml")) |
yaml.safe_load_all(file) | Load multiple YAML documents from a single file | docs = yaml.safe_load_all(open("f.yaml")) |
Ideal for configuration files (e.g., Docker, Kubernetes, CI/CD pipelines) and human-editable data.
Always prefer
safe_load()overload()when parsing YAML.
Summary
You’ve now mastered how to:
- Handle text, binary, and structured files (CSV, JSON, YAML, Pickle)
- Use proper file modes and directory operations
- Build and test a real-world file parser
What is Next?
In Chapter 4: Error Handling and Debugging, we’ll explore:
- Python’s built-in exception handling
- Logging strategies
- Using
pdband IDE debuggers - Enhancing robustness in real-world code
Check your understanding
Test your knowledge of Files and Data Serialization