Data Bases
Custom Term Papers
Free Term Papers
Free Research Papers
Free Essays
Free Book Reports
Plagiarism?
Links
Top 100 Term Paper Sites
Top 25 Essay Sites
Top 50 Essay Sites
Search 97,000 Papers @ DirectEssays.com
Search 101,000 Papers @ ExampleEssays.com
Search 90,000 Papers @ MegaEssays.com
Free Essays
Term Paper Sites
Chuck III's Free Essays
Free College Essays
TermPaperSites.com
My Term Papers
Get Free Essays
Essay World
Planet Papers
Search Lots of Essays
Back to Subjects
-
Computers
Relative File Organization
Relative File Organization This topic discusses different types of indexing techniques that allows the location of records in a file relatively fast with fewer accesses. The techniques that will be addressed deals with random access file organization only. As this is the best file organization that allows records to be stored randomly rather than sequentially or in a contiguous manner. By using direct addressing, a predictable relationship between a key of a record and the location of that record on an external file is established. Two different forms of addressing can be used to establish this relationship: Absolute addressing make use of the storage devices to determine the relationship, and therefore it is machine dependent, e.g., cylinder-number, surface-number, and record-number if we are using cylinder addressing or sector-number and record-number if sector addressing is being used. Hashing is the application of a function to the key value of a record that results in mapping the range of possible key values into a smaller range of relative addresses. For example, if a company is to maintain data of 10,000 employees by using the employee’s social security number, which ranges from 1 to 999999999, into 10,000 relative positions. The hash function to be applied to the social security number must be able to match each 10,000 social security number into each relative address available. However, collisions do occur. Collisions occurs when two different keys, in this case two social security numbers, hash into the same relative address. These two different keys are The relationship between the file space and the number of keys is described as the load factor. Load factor is the ratio of the number of key values to be stored versus the number of file positions: Load Factor = number of key values / number of file positions Prime number division remainder method works just like using the mod operator in Pascal or the % operator in C or C++. The key to a record is divided with a prime-number and the remainder from the division is used as the relative address for that record. This method analyzes the key values to determine which digit positions in the key are more evenly distributed. The more evenly distributed digit positions are assigned from right to left, and the digit values are extracted and used as the relative address. For example, for a key value 546032178, the relative address could be 8134, from left to right the first, third, fifth, and eighth digit positions has been extracted. Note that the extraction will map the key values into a range of 0 to 9999, generating 10,000 positions. Folding involves the splitting of the key into two or more parts and then summing the parts together to form the relative address. For example if we have a key value that is 25936715, and we need to map it to a range between 0 and 9999. We could split the numbers as 2593 and 6715, which the summation would result in 9308, and this would be our relative address. We could also fold the key in three parts, that is splitting the key into 3 different parts and then summing the parts together to get the relative address. As you can see, folding is very useful if we have a key that is very large. Radix conversion make use of a number base to calculate the relative address. For example, we usually count in base 10 and use base 2 for binary numbers. We can use any uncommon number as our base to calculate the relative address for a particular record. Let’s say we have a key value of 38652, and we want to calculate the relative address using base 11 into a range between 0 and 9999. The conversion is as follows: 3x11^4 + 8x11^3 + 6x11^2 + 5x11^1 + 2x11^0 = 55354 (^ means exponent and x means This is our key value in base 11. Since we want the relative address to be in the range of 0 to 9999 we can truncate the high-order 5 to yield 5354 as our relative address. Try using another base to calculate the relative address. Mid-square randomizing scheme involves the extraction from the key the middle n digits and then squaring them to form the relative address. Again the excess high-order digit can be truncated to make the value fit the range. For the key value 29615834, the middle three digits (158) could be extracted and squared, resulting in 24964 as the relative address. Since this number is too large to be in the range value, the high-order digit is extracted and the relative address now becomes 4964 for the recored with the key value of 29615834. This form of randomizing scheme works well if the key does not contain a lot of leading or trailing zeros. There are several ways for handling collisions and synonyms. They can be classifiied as open addressing and chaining. The methods described are as follows: 2.Two-Pass File Creation with Hashing In linear probing, files are scanned sequentially as a circular file and the synonym, which caused the collision, is stored in the nearest available space to the hashed address. That is if we use a hashing function to calculate the relative address for the words ‘AS’ and ‘HIS’, and these two words happens to have the same relative address, say 18, the first word ‘AS’ will be stored in the relative address 18, the second word ’HIS’ will be stored in next available position that is empty. It could be position 19 or 20, in general it could be i+1, i+2, ..., n, 1, 2, ..., i-1. Where i in this case is position 18, and n is the total number of positions available. This technique is an improvement on the linear probing technique. What it does is that it will run two - passes on the keys. The first run stores the keys that are not synonyms in their hashed address while the second run stores the synonyms in the next available positions. This technique reduces the amount of displacement that is caused by linear probing and thus improves the searching for a key that is not a synonym. Separate overflow area uses the one pass algorithm (just like linear probing), but it stores its synonyms in a separate overflow location. This method eliminates the displacement problem altogether but accessing synonyms entails sequentially searching all the records in the overflow area. The is a good technique provided that the overflow area is not too big, which could cause lengthy search. Double hashing involves the use of a second hashing function to store the synonyms into an overflow area. This method would apply the first hashing function to the key of the record, if the result is not a synonym, the record is hashed into the appropriate location. If there is a synonym, a second hashing function is applied to the result of the first hashing function, generating a location that is within the overflow area, then record is stored in this location. This method can be used with or without a separate overflow area. It functions much like a linked list, in which the link to the next record is in the current record. However, all the locations are not linked together. In each location there is a link to the synonym in the overflow area, and also the overflow area locations contains a link to its synonym. Bucket addressing involves the use of more than one record space for each position in the storage area. What happens is that the first record of a position will reside in the first bucket, and the synonyms will follow. If there are more synonyms than buckets, then we have a bucket spill. In bucket spill addressing, the full bucket spills over into the next bucket. Dynamic hashing is a technique used to dynamically change the hashing function to access a hashed file that increases in size as records are added. Bibliography:
Word Count: 1379
Copyright © 1998-2008
College Term Papers
, INC All Rights Reserved.
DMCA Notifications and Requests