Skip to content

April 2025

Introduction to Regular Expressions

Regular expressions (regex) are patterns used to match character combinations in strings. These patterns can be simple characters or a combination of simple and special characters.

Examples:

  • /abc/ - A simple pattern matching the literal string "abc"
  • /ab*c/ - A pattern using special characters to match variations

The Regular Expression Syntax

Regular expressions follow a specific syntax:

  • In standard notation: /pattern/
  • In Python: r"pattern"

When a pattern is applied to text, it returns the range where the pattern exists in the string.

Literals or Simple Characters

Simple literals match the exact characters in the pattern.

Example:

  • /cat/ matches the string "cat" within text

Meta Characters

Meta characters give regular expressions their power, allowing for more sophisticated matching.

Special Characters

Character Description Example Match
| Either or cat\|hello Matches "cat" or "hello"
. Any character (except newline) he..lo Matches characters between "e" and "l"
{ } Exactly specified number of occurrences he.{2}o Matches "hello" where there are exactly 2 characters between "e" and "o"
* Zero or more occurrences he.*o Matches "hello", "hero", "helico", etc.
+ One or more occurrences he.+o Same as above but requires at least one character
? Zero or one occurrence he.?o Matches "heo" or "hero"
( ) Capture and group patterns
[ ] A set of characters [a-z] Matches any lowercase letter
^ Starts with ^hello Matches strings starting with "hello"
$ Ends with world$ Matches strings ending with "world"
\ Signals a special sequence \d Matches any digit

Character Classes

Character classes allow you to define a set of characters where any single character from that set will match.

Expression Description
[cs] Matches either "c" or "s" (e.g., /licen[cs]e/ matches "licence" or "license")
[0-9] Matches any digit from 0 to 9
[a-z] Matches any lowercase letter
[A-Z] Matches any uppercase letter
[0-9a-zA-Z] Matches any alphanumeric character
[^0-9] Matches any character that is NOT a digit
[^a-z] Matches any character that is NOT a lowercase letter

Quantifiers

Quantifiers define how many times a character, metacharacter, or character set can be repeated.

Symbol Name Meaning
? Question Mark 0 or 1 repetition
* Asterisk Zero or more times
+ Plus sign One or more times
{n,m} Curly braces Between n and m times

Examples:

  • /hello*/ matches "hell", "hello", "helloo", "hellooo", etc.
  • /hello+/ matches "hello", "helloo", "hellooo", etc. (but not "hell")
  • /hello?/ matches only "hell" or "hello"
  • /hello{2,5}/ matches "helloo", "hellooo", "helloooo", "hellooooo"

Special Sequence Characters

Pre-defined Characters

Character Description Equivalent
\w Word characters (letters, digits, underscore) [a-zA-Z0-9_]
\W NOT word characters [^a-zA-Z0-9_]
\d Digits [0-9]
\D NOT digits [^0-9]
\s Whitespace characters [ \t\n\r\f\v]
\S NOT whitespace characters [^ \t\n\r\f\v]

Boundary Characters

Boundary matchers identify specific positions within the input text.

Matcher Description
^ Matches at the beginning of a line
$ Matches at the end of a line
\b Matches a word boundary (beginning or end of word)
\B Matches anything that is NOT a word boundary
\A Matches the beginning of the input
\Z Matches the end of the input

Practice Exercises

Excerise 1: Basic Pattern Matching

Consider this example string:

The quick brown fox jumps over the lazy dog. This is outside (this is inside)

Question Answer
Match the string "fox" and provide its range 16-19
How many times does "is" appear in the string? 4
Match the pattern "(this is inside)" and provide its range 61-77

Excerise 2: Using OR Operator (pipe)

For the string:

The sun rises in the east and sets in the west. Birds sing in the morning or evening.

Question Pattern
Match either "sun" or "moon" sun|moon
Match either "east" or "west" east|west
Match either "morning" or "evening" morning|evening
Match either "rises", "sets", or "sing" rises|sets|sing
Match either "The" or "Birds" The|Birds

Excerise 3: Character Set, dot(.)

For this data:

Contact Information:
John Doe - john.doe@example.com - (555) 123-4567
Mary Smith - mary_smith@email.net - 555.987.6543
Tom Johnson - tom-johnson@company.org - (555)246-8910
Sarah Brown - sarah@brown.co.uk - +1-555-369-7412
Mike Wilson - mike.wilson@subdomain.example.edu - 555 741 0258

Question Pattern
Match any single vowel [aeiou]
Match either "John" or "Tom" John|Tom
Match any character that is NOT a digit [^0-9]
Match either "com" or "net" in email domains com|net
Match any single digit in phone numbers [0-9]
Match any character between 'T' and 'm' in "Tom" T.m
Match any single uppercase letter [A-Z]
Match any character that is not a letter or number [^0-9a-zA-Z]

Excerise 4: Quantiers

Consider the below example string and Use the "The Curious Case of the Missing Code" text to answer the following questions.

The Curious Case of the Missing Code

John_Smith123 was panicking. It was 9:30 AM on April 15, 2025, and he had just realized that the crucial code files for Project-X2021 were missing from his laptop. Yesterday at 17:45, everything had been fine when he left the office at 42 Maple Street, Suite #301.

He quickly sent an email to his boss (anna.director@techcorp.com) and his team members (dev.team@techcorp.com):

Subject: URGENT - Missing Project Files
Body: Team, I can't locate the following files:
- main_v3.2.py
- config_prod.json
- api_keys.txt (IP: 192.168.1.100)

I've checked my backups from 2023-12-01 through 2025-03-15 but found nothing. Has anyone committed changes to the repository at http://git.techcorp.com/projects/x2021? My phone number is (555) 123-4567 if you need to reach me urgently. The project deadline is in 72 hours!

Lisa responded first at 9:42 AM: "I saved a copy at C:\Projects\Backup\X2021-backup.zip. The password is XB21-9$f5. You can also check with Mark who was working late yesterday."

John sighed with relief. Crisis averted! Now he needed to update the project documentation with proper file paths like /usr/local/bin/project-x/ for Linux users and C:\Program Files\Project-X\ for Windows users.

He made a note to call Lisa later at +1-555-987-6543 to thank her properly.
Create regular expressions that match exactly what's requested (nothing more, nothing less).

Basic Character Sets

  1. Create a pattern that matches all instances of dates in the format YYYY-MM-DD.
  2. Write a regex that finds all alphanumeric identifiers that contain both letters and numbers (like "John_Smith123" or "Project-X2021").
  3. Match all times in the HH:MM AM/PM format.

Predefined Character Classes

  1. Create a pattern using \d and \w to extract all phone numbers in the format (555) 123-4567 or +1-555-987-6543.
  2. Write a regex using \s and \S to find all file paths (both Windows and Linux style).
  3. Develop a pattern using \w, \d, and \s to match all file names with version numbers (like "main_v3.2.py").

Metacharacters and Alternation

  1. Use the pipe operator (|) to match either email addresses or web URLs.
  2. Create a pattern with the dot (.) metacharacter to find all text within parentheses.
  3. Write a regex that matches IP addresses like 192.168.1.100.

Combined Challenge

  1. Create a comprehensive pattern that extracts all forms of contact information (emails and phone numbers) from the text.

Solution

  1. Pattern to match dates in YYYY-MM-DD format:

    \d{4}-\d{2}-\d{2}
    
    Matches: "2023-12-01", "2025-03-15"

  2. Pattern for alphanumeric identifiers with both letters and numbers:

    [A-Za-z][A-Za-z0-9_]*\d+[A-Za-z0-9_]*
    
    Matches: "John_Smith123", "Project-X2021", "XB21-9$f5" (part of it)

  3. Pattern for times in HH:MM AM/PM format:

    \d{1,2}:\d{2}\sAM|\d{1,2}:\d{2}\sPM
    
    Matches: "9:30 AM", "9:42 AM"

  4. Pattern for phone numbers using \d and \w:

    \(\d{3}\)\s\d{3}-\d{4}|\+\d-\d{3}-\d{3}-\d{4}
    
    Matches: "(555) 123-4567", "+1-555-987-6543"

  5. Pattern for file paths using \s and \S:

    [A-Z]:\\[^\s]+|/\S+/
    
    Matches: "C:\Projects\Backup\X2021-backup.zip", "C:\Program Files\Project-X\", "/usr/local/bin/project-x/"

  6. Pattern for filenames with version numbers using \w, \d, and \s:

    \w+_v\d+\.\d+\.\w+
    
    Matches: "main_v3.2.py"

  7. Pattern for email addresses or web URLs using pipe operator:**

    [a-zA-Z0-9_.]+@[a-zA-Z0-9_.]+\.[a-z]+|http://[^\s]+
    
    Matches: "anna.director@techcorp.com", "dev.team@techcorp.com", "http://git.techcorp.com/projects/x2021"

  8. Pattern with dot metacharacter to find text in parentheses:

    \(.*?\)
    
    Matches: "(555) 123-4567", "(IP: 192.168.1.100)"

  9. Pattern for IP addresses:

    \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
    
    Matches: "192.168.1.100"

  10. Comprehensive pattern for contact information:

    [a-zA-Z0-9_.]+@[a-zA-Z0-9_.]+\.[a-z]+|\(\d{3}\)\s\d{3}-\d{4}|\+\d-\d{3}-\d{3}-\d{4}
    
    Matches all email addresses and phone numbers in the text

Excerise 6 - Boundary Matchers

These questions are based on the following example string:

Hello world! This is line one.
World, hello! This is line two.
HelloWorld is a single word.
The word "hello" appears in quotes.
This line ends with hello
hello starts this line and world ends it with world
com.example.domain is a domain name
user@example.com is an email address.
2023-05-15 is a date format.
The final line ends the entire text.

Questions on ^ (Caret) Boundary

  1. Write a regex pattern that matches any line beginning with the word "Hello".
  2. Write a regex pattern that matches any line beginning with either "Hello" or "hello".
  3. How many lines in the example string start with a capital letter?

Questions on $ (Dollar) Boundary

  1. Write a regex pattern that matches any line ending with the word "hello".
  2. How many lines in the example string end with a period (dot)?
  3. Write a regex pattern that matches any line ending with the exact word "world".

Questions on \b (Word Boundary)

  1. Write a regex pattern that matches the standalone word "hello" (case-insensitive) in the example text.
  2. How many times does the standalone word "world" (lowercase only) appear in the example text?
  3. Write a regex pattern that matches the word "is" only when it appears as a complete word.

Questions on \B (Non-word Boundary)

  1. Write a regex pattern that matches "World" only when it's part of another word without word boundaries.
  2. In the example text, what word contains "World" without word boundaries on either side?
  3. Write a regex that matches "example" when it's part of a larger word or token.

Questions on \A (Start of String)

  1. What single word would a regex pattern \AHello match in our example text?
  2. Write a regex that matches the first 5 characters of the entire example text.
  3. How does the pattern \AThe perform on our example text?

Questions on  \Z (End of String)

  1. Write a regex pattern that matches the last sentence of the entire example text.
  2. What's the last word in the entire example text that would be matched by \w+.\Z?
  3. Write a regex that matches the last 10 characters of the entire example text.

Solution

  1. Write a regex pattern that matches any line beginning with the word "Hello".

^Hello.+
2. Write a regex pattern that matches any line beginning with either "Hello" or "hello".

^[Hh]ello.+
3. How many lines in the example string start with a capital letter ?

^[A-Z].+
4. Write a regex pattern that matches any line ending with the word "hello".

.+hello$
5. How many lines in the example string end with a period (dot)?

.+\.$
6. Write a regex pattern that matches any line ending with the exact word "world".

.+\bworld\b$
7. Write a regex pattern that matches the standalone word "hello" (case-insensitive) in the example text.

\b[Hh]ello\b
8. How many times does the standalone word "world" (lowercase only) appear in the example text?

\bworld\b
9. Write a regex pattern that matches the word "is" only when it appears as a complete word.

\bis\b
10. Write a regex pattern that matches "World" only when it's part of another word without word boundaries.

\BWorld\B\
11. In the example text, what word contains "World" without word boundaries on either side?

\w+.+\BWorld\B\w+.+
  1. Write a regex that matches "example" when it's part of a larger word or token.

\Bexample\B
13. What single word would a regex pattern \AHello match in our example text?

It matches word Hello in entire string

  1. Write a regex that matches the first 5 characters of the entire example text.

\A.{5}
15. How does the pattern \AThe perform on our example text?

No pattern is identified

16.Write a regex pattern that matches the last sentence of the entire example text.

.+\Z

17.What's the last word in the entire example text that would be matched by \w+.\Z?

text.

18.Write a regex that matches the last 10 characters of the entire example text.

.{10}\Z

Warning

The dot (.) is a very powerful metacharacter that can create problems if not used properly, as it matches almost any character.


Source: Data Science Anywhere

  • YouTube: https://www.youtube.com/@datascienceanywhere/
  • Udemy: https://www.udemy.com/user/freeai-space/
  • GitHub: https://github.com/marslearnings

Python print()

Python's print() function is one of the first commands most beginners learn, yet it offers surprising depth and versatility. This guide explores the various capabilities of the print() function that can enhance your code's output formatting and readability.

Basic Print Usage

The simplest use of the print() function is to display text or values:

print("Hello, world")
# Output: Hello, world

print(2)
# Output: 2

print(2+3)
# Output: 5

You can combine text and expressions using commas:

print("2+3 =", 2+3)
# Output: 2+3 = 5

Customizing Output with Parameters

The sep Parameter

By default, print() separates multiple arguments with spaces. You can customize this separator using the sep parameter:

# Default separator (space)
print("a", "b", "c", "d")
# Output: a b c d

# Empty separator
print("a", "b", "c", "d", sep='')
# Output: abcd

# Hyphen separator
print("a", "b", "c", "d", sep='-')
# Output: a-b-c-d

# Custom word separator
print("a", "b", "c", "d", sep='data')
# Output: adatabdatacdatad

The end Parameter

By default, print() adds a newline character at the end of output. You can change this behavior with the end parameter:

print("a", "b", "c", "d", end=' ')
print("Hello world", end=' ')
print('Hi, how are you')
# Output: a b c d Hello world Hi, how are you

Combining sep and end

These parameters can be used together for powerful formatting:

print("a", "b", "c", "d", sep="\n", end='\t')
print("Hello world", end='\t')
print('Hi, how are you')
# Output: 
# a
# b
# c
# d Hello world Hi, how are you

Escape Sequence Characters

Python's print function supports various escape sequences for special formatting:

Escape Sequence Description
\n Newline
\t Horizontal tab
\\ Backslash
\' Single quote
\" Double quote
\r Carriage return
\b Backspace

Newline Example

print("Hello\nworld")
# Output:
# Hello
# world

Tab Example

print("Hello\t\t\tworld")
# Output: Hello         world

Escaping Special Characters

To print literal escape sequences, use the backslash as an escape character:

print('\\n')  # Escapes the sequence
# Output: \n

print("c:\\notebook\\tables")
# Output: c:\notebook\tables

Quote Characters

You can use different quotation marks strategically:

print("It is a beautiful day. But it's raining today")
# Output: It is a beautiful day. But it's raining today

print('He said, "Hi"')
# Output: He said, "Hi"

Or escape quotes when needed:

print('It is a beautiful day. But it\'s raining today')
# Output: It is a beautiful day. But it's raining today

print("He said, \"Hi\"")
# Output: He said, "Hi"

Carriage Return

The \r escape sequence returns to the beginning of the line and overwrites characters:

print('123456\rhello world')
# Output: hello world

Backspace

The \b escape sequence removes the character before it:

print('hello\bworld')
# Output: hellworld

Saving Output to Files

The print() function can send output directly to files:

f = open("print_file.txt", mode='a')  # 'w' for write, 'a' for append
print(2+3, file=f)
f.close()  # Always close files after use

Best Practices

  1. Use appropriate quote types (' or ") to minimize escaping
  2. Close file handles after printing to files
  3. Consider readability when combining multiple print parameters
  4. Use f-strings for complex string formatting in modern Python

Summary

The print() function is much more than a simple output tool. With its various parameters and support for escape sequences, you can create well-formatted, readable output for debugging, user interaction, or data presentation.

By mastering these features, you'll be able to communicate more effectively through your Python programs, whether you're creating simple scripts or complex applications.

Python Variables and Type Casting

Variables are fundamental building blocks in Python programming. They act as containers for storing data values that can be manipulated and referenced throughout your code. This guide covers the basics of variable declaration, assignment, and type conversion in Python.

Variable Declaration and Basic Types

Python supports several basic variable types:

  1. int – Integer type (whole numbers)
  2. float – Floating point numbers (decimal numbers)
  3. bool – Boolean values (True or False)
  4. str – String type (textual data)

Unlike some other programming languages, Python uses dynamic typing, which means you don't need to declare the variable type explicitly.

var1 = 2        # integer
var2 = 5.0      # float
v3 = True       # boolean

You can display variables using the print() function:

print("variable - 1= ", var1)
print("variable - 2= ", var2)
print("variable - 3= ", v3)

Output:

variable - 1=  2
variable - 2=  5.0
variable - 3=  True

Checking Variable Types

The type() function allows you to check the data type of any variable:

type(var1)  # returns <class 'int'>
type(var2)  # returns <class 'float'>
type(v3)    # returns <class 'bool'>

Multiple Assignment

Python allows you to assign values to multiple variables in a single line:

x, y = 1, 2.5

print('x =', x)
print('y =', y)

Output:

x = 1
y = 2.5

Type Casting

Type casting is the process of converting one data type into another. Python provides several built-in functions for type conversion:

Integer Conversions

Converting Integer to Float
int_number = 5
float_number = float(int_number)
print(float_number)               # Output: 5.0
print('data type =', type(float_number))  # Output: data type = <class 'float'>
Converting Integer to String
int_number = 5
str_number = str(int_number)
print(str_number)                 # Output: 5
print('data type =', type(str_number))  # Output: data type = <class 'str'>

Float Conversions

Converting Float to Integer
val1 = 5.1
val2 = 5.51
# int() truncates the decimal part (does not round)
int_val1 = int(val1)
int_val2 = int(val2)
print('int_val1 =', int_val1)  # Output: int_val1 = 5
print('int_val2 =', int_val2)  # Output: int_val2 = 5
Special Rounding Functions

Python provides several functions for rounding floating-point numbers:

1. The round() Function
val1 = 5.1
val2 = 5.51
int_val1 = round(val1)
int_val2 = round(val2)
print('int_val1 =', int_val1)  # Output: int_val1 = 5
print('int_val2 =', int_val2)  # Output: int_val2 = 6
2. The ceil() Function

The ceil() function from the math module always rounds up to the next integer:

from math import ceil

val1 = 5.1
val2 = 5.51
int_val1 = ceil(val1)
int_val2 = ceil(val2)
print('int_val1 =', int_val1)  # Output: int_val1 = 6
print('int_val2 =', int_val2)  # Output: int_val2 = 6
3. The floor() Function

The floor() function from the math module always rounds down to the previous integer:

from math import floor

val1 = 5.1
val2 = 5.51
int_val1 = floor(val1)
int_val2 = floor(val2)
print('int_val1 =', int_val1)  # Output: int_val1 = 5
print('int_val2 =', int_val2)  # Output: int_val2 = 5
Converting Float to String
float_number = 5.0
str_float_number = str(float_number)
print(str_float_number)  # Output: 5.0
print('data type =', type(str_float_number))  # Output: data type = <class 'str'>

Boolean Conversions

Boolean values can be converted to other types:

# Boolean to Integer
print(int(True))   # Output: 1
print(int(False))  # Output: 0

# Boolean to String
print(str(True))   # Output: 'True'
print(str(False))  # Output: 'False'

It's worth noting that in Python, any non-zero number or non-empty string evaluates to True when converted to a boolean:

bool_val = bool(-1)
print(bool_val)  # Output: True

print(bool(0))   # Output: False
print(bool(1))   # Output: True
print(bool(""))  # Output: False
print(bool("hello"))  # Output: True

String Conversions

Converting strings to numeric types is conditional on the string content:

Converting String to Integer
# Works only if the string contains a valid integer
var8 = '5'
var8_int = int(var8)
print(var8_int)  # Output: 5
print(type(var8_int))  # Output: <class 'int'>

# This will raise a ValueError
# var7 = "datascience"
# int(var7)  # Error: invalid literal for int() with base 10: 'datascience'
Converting String to Float
# Works only if the string contains a valid number
var8 = '5.1'
var8_float = float(var8)
print(var8_float)  # Output: 5.1
print(type(var8_float))  # Output: <class 'float'>

# This will raise a ValueError
# var7 = "datascience"
# float(var7)  # Error: could not convert string to float: 'datascience'
Multi-step Conversions

Sometimes you might need to perform multiple conversions:

var8 = '5.1'
var8_float = float(var8)  # First convert string to float
var8_int = int(var8_float)  # Then convert float to int
print(var8_int)  # Output: 5
print(type(var8_int))  # Output: <class 'int'>

Best Practices

  1. Choose descriptive variable names that indicate what the variable represents
  2. Use type casting carefully, particularly when converting between numeric types and strings
  3. Handle potential errors when converting strings to numbers
  4. Be aware of the difference between truncation and rounding when converting floats to integers

Summary

Python's dynamic typing system and built-in type conversion functions provide flexibility in handling different data types. Understanding how to properly declare variables and convert between types is essential for writing effective Python code.

By mastering these concepts, you'll be better prepared to handle data transformation and manipulation tasks in your Python programs.