Files and Data Serialization

Chapter Outline

Chapter 4: Working with Files and Data Serialization

Modern Python applications frequently read, write, and serialize data. From logs and configuration files to API responses and datasets, the ability to handle different file formats is crucial. In this chapter, you’ll learn how to:

  • Read and write files using different modes (r, w, a, x, b)
  • Handle text, CSV, JSON, Pickle, and YAML formats
  • Create and delete files and directories
  • Implement a practical log parser that exports structured JSON
  • Write testable file I/O logic using temp directories

4.1 File Modes, Creation, and Directory Operations

File Modes in Python

ModeDescription
'r'Read (default); file must exist
'w'Write; overwrites or creates new file
'a'Append to file
'x'Create file; fails if it exists
'b'Binary mode
't'Text mode (default)

Reading a Text File:

Let's say you have a text file with the following content:

bash
This is a simple text file.
This is the second line of the text file.
This is the last line of the text file.

Here is a small python program that reads the content of the file into a variable and prints it on the console.

python
1file_path = 'sample.txt'
2
3try:
4 # File opened in read mode
5 with open(file_path, 'r') as file:
6 content = file.read()
7 print(content)
8except FileNotFoundError:
9 print(f"File {file_path} not found.")

Writing to a Text File:

The following program demonstrates writing a string into a text file.

python
1output_path = 'output.txt'
2
3# File opened in write mode
4with open(output_path, 'w') as file:
5 file.write("This is a new line.\nAnother line.")

Usewith open() to ensure the file closes automatically.

Creating and Deleting Files

python
1# Create
2with open("temp.txt", "x") as f:
3 f.write("Sample text")
4
5# Delete
6import os
7if os.path.exists("temp.txt"):
8 os.remove("temp.txt")

Working with Directories

python
1import os
2
3# Create directory
4os.makedirs("logs", exist_ok=True)
5
6# List directory
7print(os.listdir("logs"))
8
9# Remove empty directory
10os.rmdir("logs")

Use os.path.join() for safe cross-platform paths.

4.2 Working with Structured Formats: JSON, CSV, Pickle, YAML

JSON files

python
1import json
2
3data = {"name": "Alice", "skills": ["Python", "ML"]}
4
5# Write
6with open("user.json", "w") as f:
7 json.dump(data, f, indent=2)
8
9# Read
10with open("user.json", "r") as f:
11 print(json.load(f))

CSV files

python
1import csv
2
3rows = [["Name", "Age"], ["Alice", 30], ["Bob", 25]]
4
5# Write CSV
6with open("people.csv", "w", newline="") as f:
7 writer = csv.writer(f)
8 writer.writerows(rows)
9
10# Read CSV
11with open("people.csv", "r") as f:
12 reader = csv.reader(f)
13 for row in reader:
14 print(row)

Pickle

Python's Pickle module serializes Python objects to binary.

python
1import pickle
2
3data = {"x": 1, "y": 2}
4
5# Save binary
6with open("data.pkl", "wb") as f:
7 pickle.dump(data, f)
8
9# Load binary
10with open("data.pkl", "rb") as f:
11 restored = pickle.load(f)
12 print(restored)

Never unpickle untrusted data. It may execute arbitrary code.

YAML

YAML is human-readable and common in configuration files. Requires pyyaml:

bash
pip install pyyaml

Reading and writing YAML files.

python
import yaml
data = {"server": {"port": 8000, "debug": True}}
# Write YAML
with open("config.yaml", "w") as f:
yaml.dump(data, f)
# Read YAML
with open("config.yaml", "r") as f:
loaded = yaml.safe_load(f)
print(loaded)

4.3 Example: Log Parser Saving Output as JSON

Input: server.log

server.log
1[INFO] Service started
2[WARNING] Disk space low
3[ERROR] Failed to connect

Parser: log_parser.py

log_parser.py
1import os
2import json
3
4def parse_log_file(input_path, output_path):
5 if not os.path.exists(input_path):
6 raise FileNotFoundError("Log file not found.")
7
8 entries = []
9 with open(input_path, "r") as f:
10 for line in f:
11 line = line.strip()
12 if line.startswith("[") and "]" in line:
13 level_end = line.index("]")
14 level = line[1:level_end]
15 message = line[level_end+1:].strip()
16 entries.append({"level": level, "message": message})
17
18 with open(output_path, "w") as out:
19 json.dump(entries, out, indent=2)
20
21 return entries
22
23if __name__ == "__main__":
24 parse_log_file(sys.argv[1], sys.argv[2])

Run the Parser

bash
python log_parser.py server.log server_log.json

Expected output (saved in server_log.json):

server_log.json
1[
2 { "level": "INFO", "message": "Service started" },
3 { "level": "WARNING", "message": "Disk space low" },
4 { "level": "ERROR", "message": "Failed to connect" }
5]

4.4 Testing File I/O with Temp Directories

Test file I/O safely with pytest and tmp_path.

File: test_log_parser.py

test_log_parser.py
1import json
2from log_parser import parse_log_file
3
4def test_log_parser(tmp_path):
5 log_file = tmp_path / "server.log"
6 output_file = tmp_path / "server.json"
7
8 log_file.write_text("[INFO] Test log\n[ERROR] Crash")
9
10 results = parse_log_file(log_file, output_file)
11
12 assert results[0]["level"] == "INFO"
13 assert results[1]["message"] == "Crash"
14
15 saved = json.loads(output_file.read_text())
16 assert saved == results
bash
pytest

4.5 Python file/directory manipulation functions

File I/O – Text Files

Function / MethodDescriptionExample
open(file, mode='r')Opens a file in a given modeopen("file.txt", "w")
file.read()Reads the entire content of a filedata = f.read()
file.readline()Reads a single lineline = f.readline()
file.readlines()Reads all lines into a listlines = f.readlines()
file.write(data)Writes a string to the filef.write("Hello\n")
file.writelines(lines)Writes a list of strings to a filef.writelines(["a\n", "b\n"])
file.close()Closes the file (not needed with with open(...) as f:)f.close()
with open(...)Context manager for safe file handlingwith open(...) as f:

File I/O – Binary Files

Function / MethodDescriptionExample
open(file, 'rb')Opens file for reading in binary modewith open("img.jpg", "rb") as f:
open(file, 'wb')Opens file for writing in binary modewith open("img.jpg", "wb") as f:
file.read(size)Reads binary data (optional size)data = f.read(1024)
file.write(bytes)Writes binary dataf.write(b'\x00\x01')

File & Directory Utilities (os, shutil, pathlib)

Function / MethodDescriptionExample
os.path.exists(path)Check if path existsos.path.exists("file.txt")
os.remove(path)Delete a fileos.remove("file.txt")
os.rename(src, dst)Rename file or directoryos.rename("a.txt", "b.txt")
os.listdir(path)List directory contentsos.listdir(".")
os.makedirs(path)Create directories recursivelyos.makedirs("logs/errors")
os.rmdir(path)Remove empty directoryos.rmdir("logs")
shutil.rmtree(path)Remove non-empty directoryshutil.rmtree("logs")
os.getcwd()Get current working directoryos.getcwd()
os.chdir(path)Change current working directoryos.chdir("/tmp")

Path Handling (os.path, pathlib)

Function / MethodDescriptionExample
os.path.join(a, b)Join paths safelyos.path.join("folder", "file.txt")
os.path.basename(path)Get file nameos.path.basename("/x/y/z.txt")
os.path.dirname(path)Get directory pathos.path.dirname("/x/y/z.txt")
pathlib.Path(path).exists()Check path exists (modern alternative)Path("file.txt").exists()
pathlib.Path(path).unlink()Delete file (like os.remove)Path("file.txt").unlink()
pathlib.Path(path).mkdir()Create directoryPath("dir").mkdir(exist_ok=True)
pathlib.Path(path).rmdir()Remove directoryPath("dir").rmdir()

JSON Files (json module)

FunctionDescriptionExample
json.load(file)Parses JSON from a file objectdata = json.load(open("file.json"))
json.loads(string)Parses JSON from a stringjson.loads('{"a":1}')
json.dump(data, file)Writes JSON to filejson.dump(data, open("file.json", "w"))
json.dumps(data)Converts Python object to JSON stringjson.dumps({"a":1})

Pickle Module (pickle) – Binary Object Serialization

Function / MethodDescriptionExample
pickle.dump(obj, file)Serialize and write an object to a binary filepickle.dump(data, open("data.pkl", "wb"))
pickle.load(file)Read and deserialize an object from a binary filedata = pickle.load(open("data.pkl", "rb"))
pickle.dumps(obj)Serialize object to a bytes objectb = pickle.dumps(data)
pickle.loads(bytes_obj)Deserialize object from bytesdata = pickle.loads(b)
pickle.HIGHEST_PROTOCOLConstant for the most efficient (and recent) pickle formatpickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)

Best for Python-only data persistence, such as storing trained ML models or temporary cache structures.

Pickle isnot secure against untrusted sources.

YAML Module (PyYAML) – Human-Readable Serialization

To use YAML in Python, install PyYAML first:

bash
pip install pyyaml
Function / MethodDescriptionExample
yaml.dump(data, file)Write Python object to YAML fileyaml.dump(data, open("file.yaml", "w"))
yaml.dump(data)Convert object to YAML stringyaml_string = yaml.dump(data)
yaml.safe_dump(data)Safer version for basic Python objectsyaml.safe_dump(data, open("f.yaml", "w"))
yaml.load(file, Loader)Read YAML file (can execute arbitrary code – unsafe)data = yaml.load(f, Loader=yaml.FullLoader)
yaml.safe_load(file)Safely read YAML content into Python objectdata = yaml.safe_load(open("f.yaml"))
yaml.safe_load_all(file)Load multiple YAML documents from a single filedocs = yaml.safe_load_all(open("f.yaml"))

Ideal for configuration files (e.g., Docker, Kubernetes, CI/CD pipelines) and human-editable data.

Always prefersafe_load() over load() when parsing YAML.

Summary

You’ve now mastered how to:

  • Handle text, binary, and structured files (CSV, JSON, YAML, Pickle)
  • Use proper file modes and directory operations
  • Build and test a real-world file parser

What is Next?

In Chapter 4: Error Handling and Debugging, we’ll explore:

  • Python’s built-in exception handling
  • Logging strategies
  • Using pdb and IDE debuggers
  • Enhancing robustness in real-world code

Check your understanding

Test your knowledge of Files and Data Serialization

Feedback