Rules for naming files and folders that are cross platform and helpful in datascience

How To
Files and Folders
Rules for naming files and folders that are cross platform and helpful in datascience.
Author

Rohit Farmer

Published

November 3, 2022

2022-11-03 Corrections made as pointed by https://fosstodon.org/@rudolf read this thread https://fosstodon.org/@swatantra/109281773145979327

Introduction

This post aims to lay down some rules that help name files and folders that are cross-platform, meaning they are acceptable across Linux, MacOS, and MS Windows, the three major operating systems. I have compiled these rules following international standards and recommendations from the US National Archives. Some rules mentioned here are exceptions to OS compatibility and are mentioned here from a useability point of view.

Having suitable files and folder naming can be helpful in data science, especially while working on the command line or programmatically querying large sets of files. For example, not having spaces in file names makes it easier to refer to them on the command line; having fixed delimiters in the file names can help iterate over files and fetch critical information programmatically.

Tip

These rules are equally applicable outside datascience settings - even for your documents, photos, videos, etc.

Rules

Use styles that are internationally recognized, easy to read by a human eye, and require fewer finger movements on the keyboard, for example, using lowercase letters that eliminate the shift key.

  1. Use all lowercase letters (this doesn’t apply to OS compatibility).
  2. Use hyphens (not underscores) to separate words. Avoid using spaces as they are difficult to work with on the command line.
  3. Use YYYY-MM-DD format for dates.
  4. While using the date at the beginning of a file name, keep the rest of the file name the same as the one created on a previous date (if there is more than one file for the same purpose, but it is sectioned according to the date.)
  5. Use “.” only for file extensions.
  6. Use v1, v2, …, vn to denote file versions. Ideally, one should use a version control system (VCS) such as Git.
  7. For a stack of fewer than 100 files, number them as 00, 01, 02, …, 0n for easier sorting. Accordingly, for a stack of fewer than 1000 files, number them as 000, 001, 002, …, 00n.
  8. File names should NOT contain punctuation, symbols, or special characters. ” /  [ ] : ; | = _ , < ? > & $ # ! ’ { } ( ).

Some Examples

2022-08-31-labnotebook-for-hdstim.docx

figure-01.png
figure-02.png
figure-03.png

/path/to/folder/exploring-flow

References

Back to top

Citation

BibTeX citation:
@online{farmer2022,
  author = {Farmer, Rohit},
  title = {Rules for Naming Files and Folders That Are Cross Platform
    and Helpful in Datascience},
  date = {2022-11-03},
  url = {https://dataalltheway.com/posts/009-rules-for-naming-files},
  langid = {en}
}
For attribution, please cite this work as:
Farmer, Rohit. 2022. “Rules for Naming Files and Folders That Are Cross Platform and Helpful in Datascience.” November 3, 2022. https://dataalltheway.com/posts/009-rules-for-naming-files.