Category: Hashing

  • Collision in Hashing

    Hashing is a data structure that uses a hash function to map data to a location in the data structure. The hash function takes the data as input and returns an index in the data structure where the data should be stored. However, there can be cases where two different data elements map to the same index in the data structure. This is known as a collision.

    For example, suppose we have a hash table with 10 buckets and a hash function that maps data elements to the buckets based on their value. If two data elements have the same hash value, they will be stored in the same bucket, causing a collision.

    Collision Resolution Techniques

    If there is a collision, we need to resolve it for the data structure to work correctly. There are several techniques to handle collisions in hashing:

    • Open Addressing
    • Separate Chaining

    Open Addressing in Hashing

    Open addressing is also known as closed hashing. In open addressing all the keys are stored directly into the hash table. When situation arises where two keys are mapped to the same position, the algorithm searches for the next empty slot in the hash table for storing the key.

    There are several techniques for open addressing:

    • Linear Probing: In linear probing, if a collision occurs, the algorithm searches for the next empty slot in the hash table by moving one position at a time.
    • Quadratic Probing: In quadratic probing, if a collision occurs, the algorithm searches for the next empty slot in the hash table by moving to the next position using a quadratic function.
    • Double Hashing: In double hashing, if a collision occurs, the algorithm searches for the next empty slot in the hash table by moving to the next position using a second hash function.

    Algorithm of Open Addressing

    The algorithm of open addressing is as follows:

    1. Calculate the hash value of the key.
    2. If the slot is empty, store the key in that slot.
    3. If the slot is not empty, use a probing technique to find the next empty slot.
    4. Repeat steps 2 and 3 until an empty slot is found.
    

    Example of Open Addressing

    Following code demonstrates the open addressing technique using linear probing in C, C++, Python, Java programming languages.

    CC++JavaPython

    //C Program#include <stdio.h>#include <stdlib.h>#define SIZE 10intcustomHash(int key){return key % SIZE;}intprobe(int H[],int key){int index =customHash(key);int i =0;while(H[(index + i)% SIZE]!=0)
          i++;return(index + i)% SIZE;}voidinsert(int H[],int key){int index =customHash(key);if(H[index]!=0)
          index =probe(H, key);
       H[index]= key;}intsearch(int H[],int key){int index =customHash(key);int i =0;while(H[(index + i)% SIZE]!= key)
          i++;return(index + i)% SIZE;}intmain(){int HT[10]={0};insert(HT,12);insert(HT,25);insert(HT,35);insert(HT,26);insert(HT,45);insert(HT,55);insert(HT,65);insert(HT,75);insert(HT,85);insert(HT,95);int result =search(HT,26);if(result ==-1)printf("Key not found\n");elseprintf("Key found at index: %d\n", result);return0;}

    Output

    The output obtained is as follows −

    Key found at index: 7
    

    Separate Chaining in Hashing

    Separate chaining is also known as open hashing, in this techniques each slot in the hash table is a linked list. When a collision occurs, the data elements are stored in the linked list at that slot. This allows multiple data elements to be stored at the same index in the hash table.

    Separate chaining is a simple and effective technique for handling collisions in hashing. It allows for efficient storage and retrieval of data elements, even when collisions occur.

    Types of Separate Chaining

    There are several types of separate chaining techniques:

    • Simple chaining: In simple chaining, each slot in the hash table is a linked list that stores the data elements that map to that slot.
    • Dynamic hashing: In dynamic hashing, the hash table is dynamically resized to accommodate more data elements as needed.
    • Extendible hashing: In extendible hashing, the hash table is divided into blocks, and each block stores a subset of the data elements.

    Algorithm of Separate Chaining

    The algorithm of separate chaining is as follows:

    1. Calculate the hash value of the key.
    2. Store the key in the linked list at that index.
    3. If the linked list is empty, create a new node and store the key in that node.
    4. If the linked list is not empty, append the key to the end of the linked list.
    

    Example of Separate Chaining

    Following code demonstrates the separate chaining technique using linked list in C, C++, Python, Java programming languages.

    CC++JavaPython

    //C Program#include <stdio.h>#include <stdlib.h>structNode{int data;structNode* next;};#define SIZE 10intcustomHash(int key){return key % SIZE;}voidinsert(structNode* H[],int key){int index =customHash(key);structNode* newNode =(structNode*)malloc(sizeof(structNode));
       newNode->data = key;
       newNode->next = H[index];
       H[index]= newNode;}intsearch(structNode* H[],int key){int index =customHash(key);structNode* temp = H[index];while(temp !=NULL){if(temp->data == key)return index;
          temp = temp->next;}return-1;}intmain(){structNode* HT[10];for(int i =0; i < SIZE; i++)
          HT[i]=NULL;insert(HT,12);insert(HT,25);insert(HT,35);insert(HT,26);insert(HT,45);insert(HT,55);insert(HT,65);insert(HT,75);insert(HT,85);insert(HT,95);int result =search(HT,85);if(result ==-1)printf("Key not found\n");elseprintf("Key found at index: %d\n", result);return0;}

    Output

    The output obtained is as follows −

    Key found at index: 5
    

    Open Addressing Vs Separate Chaining

    Open AddressingSeparate Chaining
    Each slot in the hash table stores a single data element.Each slot in the hash table stores a linked list of data elements.
    Requires additional probing to find an empty slot when a collision occurs.Does not require additional probing, as data elements are stored in a linked list.
    Can lead to clustering of data elements in the hash table.Does not lead to clustering, as data elements are stored in separate linked lists.
    Can be less memory efficient, as each slot stores only one data element.Can be more memory efficient, as each slot stores a linked list of data elements.
    Can be faster for small hash tables with few collisions.Can be faster for large hash tables with many collisions.

    Conclusion

    Collision in hashing occurs when two different data elements map to the same index in the data structure. This can be resolved using collision resolution techniques like open addressing and separate chaining. These techniques allow for efficient storage and retrieval of data elements, even when collisions occur.

  • Hashing in Data Structure

    Hashing is a data structure, where we can store the data and look up that data very quickly. Hashing uses a special formula called a hash function to map data to a location in the data structure.

    The hash function takes the data as input and returns an index in the data structure where the data should be stored. This allows us to quickly retrieve the data by using the hash function to calculate the index and then looking up the data at that index.

    Imagine you have a list of names and you want to look up a specific name in the list. You could use a hash function to map the name to an index in a data structure, such as an array or a hash table. This allows you to quickly find the name in the data structure without having to search through the entire list.

    What is Hash Function?

    hash function is a mathematical function, which takes an input and returns a fixed size string of bytes. The output of the hash function is called a hash value or hash code. The hash function is designed to be fast and efficient, so that it can quickly calculate the hash value for a given input.

    Hash functions have several important properties:

    • These are deterministic meaning that the same input will always produce the same output.
    • They can quickly calculate the hash value for a given input.
    • Hash functions are secure, meaning that it is difficult to reverse-engineer the input from the hash value.
    • It has a fixed output size, so the hash value is always the same length.

    What is Hash Table?

    hash table is a data structure that make use of hash function to map keys to values. It consists of an array of buckets, where each bucket stores a key-value pair.

    The hash function is used to calculate the index of the bucket where the key-value pair should be stored. This allows us to quickly retrieve the value associated with a key by using the hash function to calculate the index and then looking up the value at that index.

    Properties of Hash Table

    Hash tables have several important properties:

    • They provide fast lookups, insertions, and deletions, with an average time complexity of O(1).
    • They can store a large amount of data easily, with a space complexity of O(n).
    • It can handle collisions, where two keys map to the same index, by using techniques like chaining or open addressing.

    Collision in Hashing

    Collision in hashing occurs when we get similar output values, or rather hash values, for different input values. This can happen due to the limited range of hash values or the nature of the hash function.

    Hashing Algorithms

    There are several hashing algorithms that are commonly used in computer science, such as:

    • MD5 (Message Digest Algorithm 5)
    • SHA-1 (Secure Hash Algorithm 1)
    • SHA-256 (Secure Hash Algorithm 256)
    • SHA-512 (Secure Hash Algorithm 512)
    • CRC32 (Cyclic Redundancy Check 32)

    These algorithms are used for various applications, such as data integrity checkspassword hashingdigital signatures, and encryption.

    Applications of Hashing

    Hashing is used in various applications in computer science, such as:

    • Storing and retrieving data quickly.
    • Implementing data structures like hash tables and hash maps.
    • Searching and indexing data efficiently.
    • Ensuring data integrity and security.
    • Password hashing and encryption.

    Conclusion

    We have discussed the concept of hashinghash functionshash tables, and collision in hashing. We also looked at some common hashing algorithms and applications of hashing in computer science.