Pattern Matching in python

Search for specific patterns in strings

Pattern matching in Python is commonly used to search for specific patterns in strings, making it easier to extract or validate information. Python provides several ways to perform pattern matching, the most popular being regular expressions (regex) using the re module.

Regular Expressions (regex)

What is Regular Expressions:

Regular expressions are sequences of characters that form search patterns. Python's re module allows you to work with regular expressions to match, search, or manipulate strings based on specific patterns.

  • Basic Functions of the re Module:

  • re.match(): Checks for a match only at the beginning of the string.

  • re.search(): Searches the entire string for the first match.

  • re.findall(): Returns all non-overlapping matches in the string as a list.

  • re.finditer(): Returns an iterator yielding match objects for all matches in the string.

  • re.sub(): Replaces occurrences of a pattern with a replacement string.

  • Pattern Matching Example Using re Module:

import re

# Example string
text = "The quick brown fox jumps over the lazy dog."

# Search for the word "fox"
result = re.search(r"fox", text)
if result:
    print("Pattern found!")
else:
    print("Pattern not found!")

Commonly Used Regular Expressions

  1. Literal Characters: Matches exact characters.
result = re.search(r"dog", "The quick brown fox jumps over the lazy dog")
print(result.group())  # Output: dog
  1. Special Characters:
  • ^: Matches the beginning of a string.
  • $: Matches the end of a string.
  • .: Matches any single character except newline.
  • []: Matches any character inside the brackets.
  • |: Acts as an OR operator.
  • (): Groups patterns.
text = "abc123def"
# Matching pattern that starts with 'abc' and ends with 'def'
result = re.match(r"abc.*def", text)
if result:
    print("Pattern matches!")
else:
    print("Pattern doesn't match!")
  1. Quantifiers:
  • *: Matches 0 or more occurrences.
  • +: Matches 1 or more occurrences.
  • ?: Matches 0 or 1 occurrence.
  • {n}: Matches exactly n occurrences.
  • {n,}: Matches n or more occurrences.
  • {n,m}: Matches between n and m occurrences.
text = "aaaabbbbcccc"
# Matching 4 consecutive 'a'
result = re.match(r"a{4}", text)
print(result.group())  # Output: aaaa
  1. Character Classes:
  • \d: Matches any digit (0-9).
  • \D: Matches any non-digit.
  • \w: Matches any word character (alphanumeric + underscore).
  • \W: Matches any non-word character.
  • \s: Matches any whitespace character (space, tab, newline).
  • \S: Matches any non-whitespace character.
text = "123abc"
# Matching digits
result = re.findall(r"\d", text)
print(result)  # Output: ['1', '2', '3']
  1. Anchors:
  • ^: Matches the start of the string.
  • $: Matches the end of the string.
text = "Hello, World!"
# Matches only if 'Hello' is at the beginning of the string
result = re.match(r"^Hello", text)
print(result.group())  # Output: Hello

Examples of Pattern Matching

  1. Finding All Occurrences:
text = "The rain in Spain stays mainly in the plain"
# Find all words starting with 'S' (case-insensitive)
matches = re.findall(r"\bS\w+", text, flags=re.IGNORECASE)
print(matches)  # Output: ['Spain', 'stays']
  1. Substituting Patterns:
text = "123 abc 456 def"
# Replace all digits with a dash '-'
new_text = re.sub(r"\d", "-", text)
print(new_text)  # Output: "--- abc --- def"
  1. Extracting Email Address:
text = "Please contact us at info@example.com for more details."
# Matching an email address pattern
email = re.search(r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}", text)
if email:
    print(email.group())  # Output: info@example.com

Advanced Regular Expressions

  1. Lookahead and Lookbehind Assertions:
  • Positive Lookahead ((?=...)): Ensures that a given pattern is followed by another pattern.
  • Negative Lookahead ((?!...)): Ensures that a given pattern is not followed by another pattern.
  • Positive Lookbehind ((?<=...)): Ensures that a given pattern is preceded by another pattern.
  • Negative Lookbehind ((?<!...)): Ensures that a given pattern is not preceded by another pattern.
# Example of positive lookahead
text = "apple pie"
result = re.search(r"apple(?=\s)", text)  # "apple" followed by a space
print(result.group())  # Output: apple

# Example of negative lookahead
text = "apple pie"
result = re.search(r"apple(?!\s)", text)  # "apple" not followed by a space
print(result)  # Output: None
  • Pattern Matching with match Statement (Python 3.10+)

Python 3.10 introduced a new match statement, which can be used for pattern matching in a more readable way, specifically for matching data structures like dictionaries, tuples, or lists.

def greet(person):
    match person:
        case {"name": name, "age": age}:
            print(f"Hello, {name}! You are {age} years old.")
        case _:
            print("I don't know you.")

person = {"name": "Jasmeet", "age": 30}
greet(person)  # Output: Hello, Jasmeet! You are 30 years old.

Key Functions in re Module

  • re.match(pattern, string): Matches a pattern at the start of the string.
  • re.search(pattern, string): Searches for the first match of the pattern.
  • re.findall(pattern, string): Finds all non-overlapping matches of the pattern.
  • re.sub(pattern, repl, string): Replaces occurrences of a pattern with a replacement string.
  • re.split(pattern, string): Splits the string by occurrences of the pattern.
No questions available.