Learn Regular expressions in few hours

Regular expressions

In php regular expressions are very useful for extract information from a string, files, documents etc. So we have divided the lessons into different days so that you can learn without any pressure.

Regular expressions lessons:

Lesson 1: Letters
In this tutorial we will discuss regular expressions as characters and we will write patterns to match a specific sequence of characters.

$string = “abcdefgh \n abcdef \n abc”;
$pattern = “/abcd(.*)/”;
preg_match_all($pattern, $string, $matches);
print_r($matches[0]);

Output:
Array ( [0] => abcdefgh [1] => abcdef )

Lesson 2: Digits
Here we will introduce any digits from 0 to 9 to match specific sequence of characters. Because no one know where characters includes digits as well.

$string = “abc123xyz \n var g = 456 \n hello 123”;
$pattern = “/(.*)123(.*)/”;
preg_match_all($pattern, $string, $matches);
print_r($matches[0]);

Output:
Array ( [0] => abc123xyz [1] => hello 123 )

Lesson 3: Character Period
dot(.) can match any single characters i.e. letter. digit, whitespace. Dot(.) can be escape by \. accordingly.

$string = “title. \n 123. \n +-. \n chess”;
$pattern = “/(.*)\.(.*)/”;
preg_match_all($pattern, $string, $matches);
print_r($matches[0]);

Output:
Array ( [0] => title. [1] => 123. [2] => +-. )

Lesson 4: Only a, b, or c
In regular expressions there is a method where you want to match specific characters. For e.g. [abc] will match a single a, b and c.

$string = “can \n man \n ran \n fan”;
$pattern = “/[cmf]an/”;
preg_match_all($pattern, $string, $matches);
print_r($matches[0]);

Output:
Array ( [0] => can [1] => man [2] => fan )

Lessons 5: Not a, b, nor c
Similar to above lesson if you want to exclude specific characters then you can do so by using [^abc] that match any single characters except a, b, c etc.

$string = “can \n man \n ran \n fan”;
$pattern = “/[^rm]an/”;
preg_match_all($pattern, $string, $matches);
print_r($matches[0]);

Output:
Array ( [0] => can [1] => fan )

Lessons 6: Characters a to z / Numbers 0 to 9
You can match or exclude for a specific range of characters by using “-“. For e.g. [1-5] will match 1 to 5, [^a-c] will match any characters except a to c.

$string = “Anc \n Fob \n Ppc \n bax \n byy \n bcz”;
$pattern = “/[A-c][n-p][a-c]/”;
preg_match_all($pattern, $string, $matches);
print_r($matches[0]);

Output:
Array ( [0] => Anc [1] => Fob [2] => Ppc )

Lessons 7: Repeat characters
How can you match if there are more than one character i.e. repetitions of characters? The solutions is using the curly braces. For e.g. a{3} will match exactly three times. a{1,4} will match not more than 4 times but not less than 1. a{2,} will match 2 or more.

$string = “helllllo \n helllo \n hello”;
$pattern = “/hel{2,4}o/”;
preg_match_all($pattern, $string, $matches);
print_r($matches[0]);

Output:
Array ( [0] => helllo [1] => hello )

Lessons 8: Repeat zero or more
Sometime user can write the price as $10,000 and some times $10. So you do not know where to drop and where to pick up.

$string = “aaaabcc \n aabbbbc \n aacc \n defff”;
$pattern = “/aa+b*c+/”;
preg_match_all($pattern, $string, $matches);
print_r($matches[0]);

Output:
Array ( [0] => aaaabcc [1] => aabbbbc [2] => aacc )

Lesson 9: Optional
Optionally means you want to match either zero or one of the preceding character. For e.g. xy?z will match either xyz or xz because “y” treated as optional.

$string = “1 hand player \n 2 hand player \n 3 hand player”;
$pattern = “/\w hand? player/”;
preg_match_all($pattern, $string, $matches);
print_r($matches[0]);

Output:
Array ( [0] => 1 hand player [1] => 2 hand player [2] => 3 hand player )

Lesson 10: Whitespace
Whitespace defines as “space”, “tab”, “\t”, “\r”, “\n” etc. So in this case you have to deal with “\s” for any specific whitespace.

$string = “1. xyz \n 2. xyz \n 3. xyz”;
$pattern = “/\d\.\s+xyz/”;
preg_match_all($pattern, $string, $matches);
print_r($matches[0]);

Output:
Array ( [0] => 1. xyz [1] => 2. xyz [2] => 3. xyz )

Lessons 11: Starts and ends
If you want to match from both start and end then using “^” and “$” you can achieve this.

$string = “Mission: successful \n Last Mission: unsuccessful \n Next Mission: successful upon capture of target”;
$pattern = “/^Mission: successful$/”;
preg_match_all($pattern, $string, $matches);
print_r($matches[0]);

Output:
Array ( [0] => Mission: successful )

Lessons 12: Group capture
You can group characters using the special ( and ) (parenthesis). To capture the image file write the expression ^(IMG(\d+))\.png$.

$string = “file_a_registry_file.pdf \n file_today.pdf \n testfile.pdf.tmp”;
$pattern = “/(\w+).pdf/”;
preg_match_all($pattern, $string, $matches);
print_r($matches[0]);

Output:
Array ( [0] => file_a_registry_file.pdf [1] => file_today.pdf [2] => testfile.pdf )

Lessons 13: Sub Group capture
You can extract multiple layers of information through regular expressions.

$string = “Hello 123 \n Hey 456 \n Hi 2015”;
$pattern = “/(\w+ (\d+))/”;
preg_match_all($pattern, $string, $matches);
print_r($matches[0]);

Output:
Array ( [0] => Hello 123 [1] => Hey 456 [2] => Hi 2015 )

Lessons 14: More Group capture
For capture more group you can use below code.

$string = “1024X768 \n 800X600 \n 480X320”;
$pattern = “/(\d+)X(\d+)/”;
preg_match_all($pattern, $string, $matches);
print_r($matches[0]);

Output:
Array ( [0] => 1024X768 [1] => 800X600 [2] => 480X320 )

Lessons 15: Match x or z
Particularly when you are using groups, you can use the | (OR) to mention different sets of characters.

$string = “I love toy \n I love boy \n I love joy”;
$pattern = “/I love (toy|joy)/”;
preg_match_all($pattern, $string, $matches);
print_r($matches[0]);

Output:
Array ( [0] => I love toy [1] => I love joy )

Lessons 16: Other characters
Using \w you can capture alphanumeric characters, using \D you can capture any non-digit character, using \S you can capture any non-whitespace character, and \W any non-alphanumeric character.
Everything can be achieve by “.*” pattern.

Sample Regular Expression pattern…
foo : The string “foo”
^foo : “foo” at the start of a string
foo$ : “foo” at the end of a string
^foo$ : “foo” when it is alone on a string
[abc] : a, b, or c
[a-z] : Any lowercase letter
[^A-Z] : Any character that is not a uppercase letter
(gif|jpg) : Matches either “gif” or “jpeg”
[a-z]+ : One or more lowercase letters
[0-9\.\-] : Аny number, dot, or minus sign
^[a-zA-Z0-9_]{1,}$ : Any word of at least one letter, number or _
([wx])([yz]) : wy, wz, xy, or xz
[^A-Za-z0-9] : Any symbol (not a number or a letter)
([A-Z]{3}|[0-9]{4}) : Matches three letters or four numbers

Leave a Reply

Your email address will not be published. Required fields are marked *