Home Regular Expression and Applications
Post
Cancel

Regular Expression and Applications

Regular Expression is an expression which is used for defining a language, this definition look very simple but whole world reside this definition. Some core functionality which are provided by Regular Expression or we can say it’s implementation are as follow:

  1. String matching
  2. Search
  3. Replace

In Programming Language

It was first basic implemented in SNOBOL language in earlies 1960s. Later on enhanced version of RE was implemented in text editor for example in grep and ed, but the new boost was added in the field of regular expression when more complex engine and complicated regular expression were introduced in Perl which was originally derived from work of Henry Spencer in 1986 who later implemented his work in Tcl language which used NFA/DFA implementation for Regular Expression implementation which was highly praised.

  1. Perl – Available in it syntax
  2. Python
    1. Import re
  3. Java
    1. import java.util.regex
  4. C++
    1. #include was introduce in C++ in C11
  5. PHP
    1. PCRE library (Perl Compatible)

And today regular expression in part of standard library, hence it is available in all languages available today eg. R, Rust, ECMAScript, Python, Go, C# and etc.

Today, Perl syntax for regular expression is followed in Java, JavaScript, PHP, Python, Ruby, Microsoft’s .NET Framework, and XML Schema.

Applications

Regular have ignited further computer science field for example Natural Language Processing, Data Mining, Compiler, Automata and etc.

  1. Validation of registration form.
    1. Some default implementation available in HTML5 are email, number, url, tel (telephone), text for type attribute in HTML5.
  2. Apache Web Server, Some of it uses in created wild card domain used when created sub domain in server side.
    1. Sub domain: *.example.com, This will redirect all sub domain queries to main script file of example.com. This approach is used in all shared hosting provider for example wordpress.com, yola.com, weebly.com and etc.
    2. Redirect Rule: In some cases we know want some core file to not get public so we hide that page using Redirect Rule which redirect based on specific query for example we want to hide all cache folder, ini and file then query will be something like cache/*, *.ini
  3. Google Search Engine, It provied some special command in search engine, such as:
    1. HTML Tutorial – w3schools.com” this will bring all result that don’t contain w3schools in it page.
    2. a * saved is a * earned” * can be used in place of unknown words.
  4. Text editor, We have built-in regular expression added earlier in Ed editor and in todays century it is available in Notepad++, Sublime Text, Atom, Visual Studio.
    1. Using in Notepad++, Check Regular expression option and write your expression in search box.
  5. grep, When there wasn’t GUI thing around and command line was the only interface, grep was introduced in 1974 to string matching using regular expression and it is still king for string matching in command line and for its easy syntax.

    example.txt file contains name of fruits.

Where have I used RE?

  1. To minify HTML, JS and CSS. I’ve use Notepad++ RE functionality to replace newlines and repeated spaces.
  2. While configuring Web server and defining Rewrite rule in .htaccess files.
  3. For validating input in site for valid email, username, password.
  4. Used in one of my project for tokenizing SQL in htmSQL where we can parse HTML using SQL.
  5. Sometime use negative sign in Google search engine.
  6. For defining routes in MVC framework such as ExpressJS, Laravel, Django.
  7. Web scraping; when extracting product name and product price from a page.
  8. SQL LIKE operator eg. “%input%”
  9. Last not the least, In Automata course exams for defining RE from given description definition.

Hacking

  1. As RE syntax was enhanced: recursive and sub expression functionality was added, which allow us to write more complex and complicated expression but It can be exploited by analyst by inputting such string which take so long to process that server crashes (Regular expression Denial of Service).
  2. Regular Expression, to validate whether your term is search term or url, Google Chrome have RE written for that, which was exploited earlier this year which make phishing attack easier on Firefox and Chrome. (LINK)

This post is licensed under CC BY 4.0 by the author.
Contents