Finding Anagrams in a given set of words.

Finding Anagrams in a given set of words.

Unread postby shilpa » Sun Aug 01, 2010 11:24 pm

Telephonic - Hyderabad - SDE
Given a set of words, find all the anagrams
shilpa
 
Posts: 136
Joined: Sun Feb 21, 2010 2:20 pm
Karma: 0

Re: Finding Anagrams in a given set of words.

Unread postby powerhorse » Sat Jan 01, 2011 2:12 pm

Insights:
1. anagrams must be of same length.
2. transitivity is exhibited among anagram strings, i.e if A and B are anagrams and B and C are anagrams then A and C are anagrams.

structures used
Code: Select all
typedef struct myDataList
{
   struct myDataList* next;
   struct myDataList* prev;
   string data;

   myDataList()
   {
      next = NULL;
      prev = NULL;
      data = "";
   }
}DataList;

typedef struct myList
{
   struct myList *next;
   DataList *anagramList;
   DataList *dataList;
   size_t length;
   size_t entries;

   myList()
   {
      next = NULL;
      anagramList = NULL;
      dataList = NULL;
      length = 0;
      entries = 0;
   }
}MainList;



anagram logic:
initialize an array of size(Language) to value 0 and
1. increment the array by one whenever a case-insensitive instance of the character occurs in String1
2. decrement the array by one whenever a case-insensitive instance of the character occurs in String2
3.check if the array is still set to 0 everywhere, if so return true else false.


Main ALgo:
1. seperate the words based on size and create a MainList of words of each size
2. in the set maintain a pointer to the list of anagrams found.
3. For every set if size of set > 1 do the following.
3.1 check two strings in DataList for anagram using anagram logic.
3.2 if strings A and B are found to be anagrams add B to the anagram list and continue iterating through the list with A.
3.2.1 if any further anagrams are found add them to the anagram list.
3.2.2 After the iteration is done remove A from data list and add to anagram list.
3.2.3 In the anagram list select different combinations of 2 strings from the anagram list and print out.
3.2.4 after all combinations are printed out delete the contents of the anagram list.
3.3 repeat procedure 3.2 to 3.2.4 until the list of strings in the data are exhausted.
4. repeat step 3 and its sub steps until all elements of the main list are exhausted.

restrictions:
1. Uses extra memory but efficiently.
powerhorse
 
Posts: 92
Joined: Fri Mar 05, 2010 12:37 pm
Karma: 12

Re: Finding Anagrams in a given set of words.

Unread postby IIITB » Thu Feb 17, 2011 10:00 pm

Two words are Anagram of each other if their sorted sequence is same.
For instance, cat, tac, act are Anagrams of each other.
When we sort the characters in each word, they are all the same.
act, act, act
We can use this logic for determining if two words are Anagrams of each other.

Data structure used in this Algorithm Hash-Map
hash_map<string, vector<string>> anagrams;
key is string (sorted chars sequence) and value is a vector of strings (which are anagram to each other)

1) Sort the chars in word
2) Check if the sorted chars string is present in the hash-map
a) If present, then insert the original string as value into the vector of strings
b) If not present, then insert the sorted chars string as another key and the original string as value into the vector of strings.
3) Repeat 1 and 2 for all the given set of words
4) Now all the Anagrams-words are grouped into each vector of strings
5) Iterate over the hash-map and print the strings in each vector

Complexity analysis:
1) Sorting the chars in a given word.
Lets assume the chars are from ASCII char set
We can perform sorting using counting sort
Time complexity:
O(n) - n is the length of the string
Space:
Scratch array of 255 integers to keep track of individual char count - This is constant irrespective of the string length

2) Search in the HASH-map is constant (Assume hashing is perfect here)
3) Insertion in the HASH-map is constant (Assume hashing is perfect here)

So overall, the time complexity appears to linear to the input.
Space complexity is Hash-map + scratch array of 255 integers.
IIITB
 
Posts: 148
Joined: Tue Feb 08, 2011 4:20 pm
Karma: 0

Re: Finding Anagrams in a given set of words.

Unread postby PSG » Thu Feb 17, 2011 11:02 pm

Code implementing IIITBModerator's Algorithm.
Code: Select all
typedef stdext::hash_map<string, vector<string>> ANAGRAM;

// Just pushes a few strings for testing purposes
void populateWords(vector<string>& words)
{
   words.push_back("act");
   words.push_back("tac");
   words.push_back("cat");
   words.push_back("3214");
   words.push_back("2341");
   words.push_back("madam");
   words.push_back("daamm");
   words.push_back("123456");
   words.push_back("7123456");
}

// Given a string, sort the chars in the string using counting sort mechanism
// Keeps the count of individual chars in the string in an array where each index represent the count of that char
// Idea behind this is - there are only 0 to 255 ascii values, so each array index can be used to represent a char
// sweep the entire string and increment the appropriate index in the scratch array
// Then traverse the scratch array from left to right and replacing that many char in the original string

void CountingSort(string& sortThis)
{
   
   if(sortThis.size()){

      unsigned long count[257];
      set<char> distinctChars;
      set<char>::const_iterator it, end;
      int size = sortThis.size(), index = 0;

      memset(count, 0, sizeof(count));
      while(index < size){
         ++count[sortThis[index]];
         if(1 == count[sortThis[index]])
            distinctChars.insert(sortThis[index]);
         ++index;
      }

      index = 0;
      it = distinctChars.begin(), end = distinctChars.end();
      while(it != end){
         while(count[*it]){
            sortThis[index++] = *it;
            --count[*it];
         }
         ++it;
      }
   }
}

// Insert the sorted chars sequence into the hash map, if its not present already
// If present, then push it to the vector of strings
// key - sorted string
// value - vector of original strings

void groupAnagrams(const vector<string>& words, ANAGRAM& output)
{
   ANAGRAM::iterator hmit;
   vector<string>::const_iterator it = words.begin(), end = words.end();
   while(it != end){
      string sorted = *it;
      CountingSort(sorted);
      hmit = output.find(sorted);
      if(hmit == output.end()){
         output[sorted].push_back(*it);
      }
      else{
         hmit->second.push_back(*it);
      }
      ++it;
   }
}

// This prints all the anagram words in every line
void printAnagrams(const ANAGRAM& output)
{
   ANAGRAM::const_iterator it = output.begin(), end = output.end();

   while(it != end){
      vector<string>::const_iterator vit = it->second.begin(), vend = it->second.end();
      while(vit != vend){
         cout<<vit->c_str()<<"\t";
         ++vit;
      }
      cout<<endl;
      ++it;
   }
}

int _tmain(int argc, _TCHAR* argv[])
{
   vector<string> words;
   ANAGRAM output;

   populateWords(words);
   groupAnagrams(words, output);
   printAnagrams(output);
   return 0;
}
PSG
 
Posts: 621
Joined: Sun Aug 15, 2010 10:46 pm
Karma: 1


Topic Tags

OS, JAVA, Algos, Recursion, C, Coding, DS, Puzzle, Trees, CPP, Linked List, Array, Design, HR, UNIX, Networking, Pointer, WPF, CSharp, [Java], Aptitude, DB, @admin can u remove all duplicate posts!!!, Testing, GRAPH, VMware products, VMware heartbeat, vCenter server, GD, vCloud, Security, ESXi

Return to Algorithms

cron