ideone

useful pieces of codes

View on GitHub

Statement

Search a word in the given string. Return all the positions where the word matches in the string.

Approach

It generates a unique hash of the word. The hash changes if even a single char is changed. It even changes if a char changes its position. As hash is sensitive to both char value and its position it can be used to check quickly against hash of the part of the string under consideration.

Caveat

This is a hash based implementation. It sacrifices some accuracy for simplicity and speed. We can have some false positive but never a true negative, i.e., we can get a result where word doesn’t actually matches the string, but we will never miss any result if it is present.

It happens because two words may have same hash. Although it is very rare but it is very much a chance. As the hash can take only mod=1000000007 (10^9+7) values, if the number of possible words is more than mod some of them will have clashes. In case of a clash we can have a false positive. But same word cannot have two hashes hence we will never have a true negative.

We can put an overhead to check if the word matches at the position or not, once the hash is matched. If size of word is small then the overhead will also be small.

Complexity

Say size of word : w

say size of search string : n

hash of word : O(w)

hash of window : O(w)

hash of next window : O(1)

moving window accross search space : O(n)

Total Time Complexity : O(n+w)

Overhead : O(r*w)

here r is the size of result

Code

https://ideone.com/8rv2vs

Code

#include <iostream>
#include <vector>
using namespace std;
int mod = 1000000007;
int l = 26;
char a = 'a';
long long exp(int b, int e){
	long long a = 1;
	long long x = b;
	while(e){
		if(e%2)	a = (a*x)%mod;
		x = (x*x)%mod;
		e = e/2;
	}
	return a;
}
void push_back(long long &seed, int c){
	seed = (26*seed + c%l)%mod;
}
void pop_front(long long &seed, int c, long long x){
	x *= c%l;
	x = mod - x%mod;
	seed = (seed+x)%mod;
}

vector<int> match(const string &s, const string &w){
	vector<int> p;
	
	int n = w.size();
	if(!n)	return p;
	
	int h_minus = exp(l,n-1);
	
	long long h = 0;
	for(char c: w)	push_back(h,c-a);
	cout<<"Hash value of the word: "<<h<<endl;
	
	if(s.size()<n)	return p;
	int i=0;
	long long x=0;
	while(i<n)	push_back(x,s[i++]-a);
	if(h==x)	p.push_back(i-n);
	
	while(i<s.size()){
		pop_front(x,s[i-n]-a,h_minus);
		push_back(x,s[i++]-a);
		if(h==x)	p.push_back(i-n);
	}
	return p;
}
int main() {
	string s;
	string w;
	cin>>s>>w;
	vector<int> ans = match(s,w);
	cout<<"The given word is matched in the string at following positions:"<<endl;
	for(int x: ans)	cout<<x<<' ';
	cout<<endl;
	cout<<s<<endl;
	int j=0;
	for(int i=0; i<s.size()&&j<ans.size(); i++){
		if(i==ans[j]&&++j)	cout<<'^';
		else	cout<<' ';
	}
	cout<<endl;
	return 0;
}

Ideone it!