date-normalizer

from dkyazzentwatwa/chatgpt-skills

My comprehensive, tested + audited, library of skills to use for ChatGPT.

7 stars0 forksUpdated Dec 17, 2025
npx skills add https://github.com/dkyazzentwatwa/chatgpt-skills --skill date-normalizer

SKILL.md

Date Normalizer

Parse and normalize dates from various formats into consistent, standardized formats for data cleaning and ETL pipelines.

Purpose

Date standardization for:

  • Data cleaning and ETL pipelines
  • Database imports with mixed date formats
  • Log file parsing and analysis
  • International data harmonization
  • Report generation with consistent dating

Features

  • Smart Parsing: Automatically detect and parse 100+ date formats
  • Format Conversion: Convert to ISO 8601, US, EU, or custom formats
  • Batch Processing: Normalize entire CSV columns
  • Ambiguity Detection: Flag dates that could be interpreted multiple ways
  • Timezone Handling: Convert and normalize timezones
  • Relative Dates: Parse "today", "yesterday", "next week"
  • Validation: Detect and report invalid dates

Quick Start

from date_normalizer import DateNormalizer

# Normalize single date
normalizer = DateNormalizer()
result = normalizer.normalize("03/14/2024")
print(result)  # {'normalized': '2024-03-14', 'format': 'iso8601'}

# Normalize to specific format
result = normalizer.normalize("March 14, 2024", output_format="us")
print(result)  # {'normalized': '03/14/2024', 'format': 'us'}

# Batch normalize CSV column
normalizer.normalize_csv(
    'data.csv',
    date_column='created_at',
    output='normalized.csv',
    output_format='iso8601'
)

CLI Usage

# Normalize single date
python date_normalizer.py --date "March 14, 2024"

# Convert to specific format
python date_normalizer.py --date "14/03/2024" --format us

# Normalize CSV column
python date_normalizer.py --csv data.csv --column date --format iso8601 --output normalized.csv

# Detect ambiguous dates
python date_normalizer.py --date "01/02/03" --detect-ambiguous

API Reference

DateNormalizer

class DateNormalizer:
    def normalize(self, date_string: str, output_format: str = 'iso8601',
                 dayfirst: bool = False, yearfirst: bool = False) -> Dict
    def normalize_batch(self, dates: List[str], **kwargs) -> List[Dict]
    def normalize_csv(self, csv_path: str, date_column: str,
                     output: str = None, **kwargs) -> str
    def detect_format(self, date_string: str) -> str
    def is_valid(self, date_string: str) -> bool
    def is_ambiguous(self, date_string: str) -> bool
    def parse_relative(self, relative_string: str) -> datetime

Output Formats

ISO 8601 (default):

'2024-03-14'  # Date only
'2024-03-14T15:30:00'  # With time
'2024-03-14T15:30:00+00:00'  # With timezone

US Format:

'03/14/2024'  # MM/DD/YYYY

EU Format:

'14/03/2024'  # DD/MM/YYYY

Long Format:

'March 14, 2024'

Custom Format:

normalizer.normalize(date, output_format='%Y%m%d')  # '20240314'

Supported Input Formats

Numeric:

  • 2024-03-14 (ISO)
  • 03/14/2024 (US)
  • 14/03/2024 (EU)
  • 14.03.2024 (German)
  • 2024/03/14 (Japanese)
  • 20240314 (Compact)

Textual:

  • March 14, 2024
  • 14 March 2024
  • Mar 14, 2024
  • 14-Mar-2024

Relative:

  • today, yesterday, tomorrow
  • next week, last month
  • 2 days ago, in 3 weeks

With Time:

  • 2024-03-14 15:30:00
  • 03/14/2024 3:30 PM
  • 2024-03-14T15:30:00Z

Ambiguity Handling

Dates like 01/02/03 are ambiguous. Specify interpretation:

# Day first (EU)
normalizer.normalize("01/02/03", dayfirst=True)
# Result: 2003-02-01

# Month first (US)
normalizer.normalize("01/02/03", dayfirst=False)
# Result: 2003-01-02

# Year first
normalizer.normalize("01/02/03", yearfirst=True)
# Result: 2001-02-03

Use Cases

Clean Messy Data:

messy_dates = [
    "March 14, 2024",
    "2024-03-15",
    "03/16/2024",
    "17-Mar-2024"
]

normalized = normalizer.normalize_batch(messy_dates)
# All converted to: ['2024-03-14', '2024-03-15', '2024-03-16', '2024-03-17']

CSV Normalization:

# Input CSV with mixed date formats
# Convert all to ISO 8601
normalizer.normalize_csv(
    'orders.csv',
    date_column='order_date',
    output='orders_normalized.csv',
    output_format='iso8601'
)

Validation:

if not normalizer.is_valid("invalid date"):
    print("Invalid date detected")

Timezone Conversion:

normalizer.normalize(
    "2024-03-14 15:30:00+00:00",
    output_timezone='America/New_York'
)

Limitations

  • Cannot parse dates from images or PDFs (use OCR first)
  • Ambiguous dates require manual specification of format
  • Very old dates (<1900) may have limited support
  • Non-Gregorian calendars not supported
  • Some regional formats may need explicit configuration

Repository Stats

Stars7
Forks0