String matching-2

2.Rabin Karp algorithm

The Rabin-Karp algorithm is a string searching algorithm that uses hashing to find a substring in a text.in this algorithm if two strings are equal, their hash values are also equal.
lets discuss using a example

T[ ] = a b c d a a d e d a
P[ ] = a a d

 Σ = alphabet 

 Σ = {a, b, c, d, e }

a = 1                            prime = 3
b = 2
c = 3
d = 4
e = 5

P[ ] hash value  = 1 + 1x(3^1) + 4x(3^2)
                        =  40

now considering hash values of the text


  • first 3 items
        a b c d a a d e d a

        {a b c} = 1 + 2x(3^1) + 3x(3^2)
                   = 34

hash values does not match

shift by one and consider next 3 items

{b c d} = 2 + 3x3 +4x9
           = 47

hash does not match


there is a easy way to fined hashes after finding first one

old hash   = v
new hash = n
last index of the pattern = l
value new comparing texts last item = u
value old texts first item =w
prime = p


n = (v - w)/p + u(p^l)

{c d a} = (47-2) +1(3^2)
                 3

           = 24

hashes does not match

{ d a a} = 24-3  + 1x9
                 3
            = 16

hashes does not match

{ a a d} = 16-4 + 4x9
                 3
            =  40

now hash values of the text and pattern matches.


pseudo code

// correctly calculates a mod b even if a < 0
function int_mod(int a, int b)
{
  return (a % b + b) % b;
}

function Rabin_Karp(text[], pattern[])
{
  // let n be the size of the text, m the size of the
  // pattern, B - the base of the numeral system,
  // and M - a big enough prime number

  if(n < m) return; // no match is possible

  // calculate the hash value of the pattern
  hp = 0;
  for(i = 0; i < m; i++)
    hp = int_mod(hp * B + pattern[i], M);

  // calculate the hash value of the first segment
  // of the text of length m
  ht = 0;
  for(i = 0; i < m; i++)
    ht = int_mod(ht * B + text[i], M);

  if(ht == hp) check character by character if the first
               segment of the text matches the pattern;

  // start the "rolling hash" - for every next character in
  // the text calculate the hash value of the new segment
  // of length m; E = (Bm-1) modulo M         
  for(i = m; i < n; i++) {
    ht = int_mod(ht - int_mod(text[i - m] * E, M), M);
    ht = int_mod(ht * B, M);
    ht = int_mod(ht + text[i], M);

    if(ht == hp) check character by character if the
                 current segment of the text matches
                 the pattern;
  }
}

Implementation in python

class RollingHash:
    def __init__(self, text, sizeWord):
        self.text = text
        self.hash = 0
        self.sizeWord = sizeWord

        for i in range(0, sizeWord):
            #ord maps the character to a number
            #subtract out the ASCII value of "a" to start the indexing at zero
            self.hash += (ord(self.text[i]) - ord("a")+1)*(26**(sizeWord - i -1))

        #start index of current window
        self.window_start = 0
        #end of index window
        self.window_end = sizeWord

    def move_window(self):
        if self.window_end <= len(self.text) - 1:
            #remove left letter from hash value
            self.hash -= (ord(self.text[self.window_start]) - ord("a")+1)*26**(self.sizeWord-1)
            self.hash *= 26
            self.hash += ord(self.text[self.window_end])- ord("a")+1
            self.window_start += 1
            self.window_end += 1

    def window_text(self):
        return self.text[self.window_start:self.window_end]

def rabin_karp(word, text):
    if word == "" or text == "":
        return None
    if len(word) > len(text):
        return None

    rolling_hash = RollingHash(text, len(word))
    word_hash = RollingHash(word, len(word))
    #word_hash.move_window()

    for i in range(len(text) - len(word) + 1):
        if rolling_hash.hash == word_hash.hash:
            if rolling_hash.window_text() == word:
                return i
        rolling_hash.move_window()
    return None

#print rabin_karp("a", "abcdefgh")
#print rabin_karp("d", "abcdefgh")
#print rabin_karp("cupcakes", "balloonsandcupcakes")


Implementation in c 

#include<stdio.h>
#include<string.h>
// d is the number of characters in input alphabet
#define d 256
/* pat -> pattern
    txt -> text
    q -> A prime number
*/
void search(char pat[], char txt[], int q)
{
    int M = strlen(pat);
    int N = strlen(txt);
    int i, j;
    int p = 0; // hash value for pattern
    int t = 0; // hash value for txt
    int h = 1;
    // The value of h would be "pow(d, M-1)%q"
    for (i = 0; i < M-1; i++)
        h = (h*d)%q;
    // Calculate the hash value of pattern and first
    // window of text
    for (i = 0; i < M; i++)
    {
        p = (d*p + pat[i])%q;
        t = (d*t + txt[i])%q;
    }
    // Slide the pattern over text one by one
    for (i = 0; i <= N - M; i++)
    {
        // Check the hash values of current window of text
        // and pattern. If the hash values match then only
        // check for characters on by one
        if ( p == t )
        {
            /* Check for characters one by one */
            for (j = 0; j < M; j++)
            {
                if (txt[i+j] != pat[j])
                    break;
            }
            // if p == t and pat[0...M-1] = txt[i, i+1, ...i+M-1]
            if (j == M)
                printf("Pattern found at index %d \n", i);
        }
        // Calculate hash value for next window of text: Remove
        // leading digit, add trailing digit
        if ( i < N-M )
        {
            t = (d*(t - txt[i]*h) + txt[i+M])%q;
            // We might get negative value of t, converting it
            // to positive
            if (t < 0)
            t = (t + q);
        }
    }
}
/* Driver program to test above function */
int main()
{
    char txt[] = "sritechviews";
    char pat[] = "chvi";
    int q = 101; // A prime number
    search(pat, txt, q);
    return 0;
}

Time complexity

in worst case :- O( mn )
in best case   :- O( n+m )

String matching-1

String matching

Introduction 

String matching is used to fined a pattern inside a text. The pattern and text should always be like this, |text| > |pattern| .

There are some algorithms that we can use to do string matching.

  • Naive (brute-force) algorithm
  • Knuth Morris Pratt (KMP)
  • Boyer Moor algorithm 
  • Rabin-Karp algorithm

1.Naive algorithm

 This algorithm simply test all the possible placements of Pattern relative to Text
and Consists of two nested loops.

n  = size of the Text
m = size of the pattern

one loop in this algorithm runs (n-m) times and other one m times.


T[] = a b b  c d e
P[] = b b c

first match T[0] and P[0] 
in above example first items do not match. because of that shift the pattern by 1 and match again.

T[] = a b b c d e
P[] =    b b c

after shifting T[1] = P[0] 
because of that next try to match second item in pattern 
T[2] = P[1]
because that two elements mach too next compare last item in the pattern.
T[3] = P[2]
because that matches return the matching point

Like this until the end of the Text we can fined the occurrences of the pattern.

Time complexity 

In the worst case scenario 

                  T[] = a a a a a a a a a
                  P[] = a a a

in this example we get a matching pair every time.because of that we have to search each and every item with the pattern.

for this              O((n-m)(m))
                         O(nm-m.m)
                         O(nm)           (n >> m)

we can also take following example as worst case scenario 

                 T[] = a a a a a a a a a a
                 P[] = a a b

In best case scenario 

                 T[] = a a a a a a a b
                 P[] = b a a

in this case there is no matches what so ever

for this              O(n-m)
                         O(n)               (n >> m)

pseudo code

for i in range (0 , n-m-1) {
   matchcount = 0;
   for j in range (0 , m-1) {
       if T[i+j] == P[j]{
           matchcount++ ;
       else 
           break ;
       }
   }
}


implementation in C

#include<stdio.h>
#include<string.h>

int main()
{
int count = 0, m,n,i,j;
char pattern[256], text[256];

printf("\nEnter the pattern: ");
scanf("%s", pattern);
printf("\nEnter the text: ");
scanf("%s", text);

m = strlen(pattern);
n = strlen(text);

for(i=0; i< n-m+1; i++)
{
   for(j=0; j<m; j++)
   {
      if(text[i+j] == pattern[j])
      count++ ;
   }
   if(count == m)
   {
      printf("\nMatching point: %d",i);
   }
   count = 0;
}
return 0;
}       

Bubble sort

Bubble sort(sinking sort)

Bubble sort is a simple sorting algorithm. In this algorithm same process happens until the considering list become a sorted list. Throughout the sorting process compares each pair of adjacent items and adjust them one pair at a time. 

Lets consider a list containing numbers to be sort

6,4,3,5,7

lets take  first two numbers 6 & 4

6>4

because of that we have to adjust 6 & 4 

new list :  4,6,3,5,7

then take second pair of numbers

6>3

after adjusting new list :  4,3,6,5,7

like this continue until the lat pair

4,3,5,6,7

4,3,5,6,7

after comparing last pair start same process from the beginning of the new list

3,4,5,6,7

after this step list becomes sorted.


Pseudo code implementation 

begin Bubblesort(list)
    temp == 0
    for all
        if list[i] > list[i+1]
              temp == list [i+1]
               list[i+1] == list[i]
                list[i] = temp
         end if
     end for 
     
      return list

end Bubblesort

python implementation 

def bubbleSort(alist):
    for passnum in range(len(alist)-1,0,-1):
        for i in range(passnum):
            if alist[i]>alist[i+1]:
                temp = alist[i]
                alist[i] = alist[i+1]
                alist[i+1] = temp


implementation using c

#include <stdio.h> 

void bubble_sort(long [], long); 
int main()
{
  long array[100], n, c, d, swap;
  printf("Enter number of elements\n");
  scanf("%ld", &n);
  printf("Enter %ld integers\n", n);
  for (c = 0; c < n; c++)
    scanf("%ld", &array[c]);
  bubble_sort(array, n);
  printf("Sorted list in ascending order:\n");
  for ( c = 0 ; c < n ; c++ )
     printf("%ld\n", array[c]);
  return 0;
}
void bubble_sort(long list[], long n)
{
  long c, d, t;
  for (c = 0 ; c < ( n - 1 ); c++)
  {
    for (d = 0 ; d < n - c - 1; d++)
    {
      if (list[d] > list[d+1])
      {
        /* Swapping */
        t         = list[d];
        list[d]   = list[d+1];
        list[d+1] = t;
      }
    }
  }
}

Time complexity

worst caseO( n2)

best case:  O( n)






String matching-2

2.Rabin Karp algorithm The Rabin-Karp algorithm is a string searching algorithm that uses hashing to find a substring in a text.in this a...