Home >Java >javaTutorial >How can Trie data structures be used to efficiently implement sparse matrices, providing faster read-only access and optimized storage compared to traditional hashmaps?
Sparse matrices, also known as sparse arrays, are data structures used to represent matrices where most of the elements are zero or undefined. Unlike traditional matrices, sparse matrices only store the non-zero elements, making them efficient for storing large matrices with a large number of zeros.
Implementing sparse matrices using hashmaps can be inefficient for frequently read data because hashmaps introduce collisions for both reading and writing, requiring complex hashing functions and loops to handle each candidate position and compare it with the source index.
A more efficient approach is to use a Trie (Trie Radix Tree) structure that allows direct access to a single vector where segments are distributed. Tries can determine the presence of an element in the table with only two array indexing operations, providing fast read-only access and a default position in the backing store for default values.
This approach avoids any test on the returned index, guarantees that all source indices map to at least the default position in the backing store, and supports fast updatable Tries with an optional "compact()" operation to optimize storage space at the end of multiple operations.
Tries are significantly faster than hashmaps because they don't require complex hashing functions or collision handling, and they work efficiently with Java IntegerTrieMap and LongTrieMap for Integer and Long indices, although these are not currently included in the JRE.
Example Code:
<code class="java">public class DoubleTrie { // Matrix options public static final int SIZE_I = 1024; public static final int SIZE_J = 1024; public static final double DEFAULT_VALUE = 0.0; // Internal splitting options private static final int SUBRANGEBITS_I = 4; private static final int SUBRANGEBITS_J = 4; // Internal derived splitting constants private static final int SUBRANGE_I = 1 << SUBRANGEBITS_I; private static final int SUBRANGE_J = 1 << SUBRANGEBITS_J; private static final int SUBRANGEMASK_I = SUBRANGE_I - 1; private static final int SUBRANGEMASK_J = SUBRANGE_J - 1; private static final int SUBRANGE_POSITIONS = SUBRANGE_I * SUBRANGE_J; // Internal derived default values for constructors private static final int SUBRANGES_I = (SIZE_I + SUBRANGE_I - 1) / SUBRANGE_I; private static final int SUBRANGES_J = (SIZE_J + SUBRANGE_J - 1) / SUBRANGE_J; private static final int SUBRANGES = SUBRANGES_I * SUBRANGES_J; private static final int DEFAULT_POSITIONS[] = new int[SUBRANGES]; private static final double DEFAULT_VALUES[] = new double[SUBRANGE_POSITIONS](DEFAULT_VALUE); // Internal fast computations private static final int subrangeOf(int i, int j) { return (i >> SUBRANGEBITS_I) * SUBRANGE_J + (j >> SUBRANGEBITS_J); } private static final int positionOffsetOf(int i, int j) { return (i & SUBRANGEMASK_I) * SUBRANGE_J + (j & SUBRANGEMASK_J); } // Internal member variables private double values[]; private int subrangePositions[]; private boolean isSharedValues; private boolean isSharedSubrangePositions; // Internal method private final void reset(double[] values, int[] subrangePositions) { this.isSharedValues = (this.values = values) == DEFAULT_VALUES; this.isSharedSubrangePositions = (this.subrangePositions = subrangePositions) == DEFAULT_POSITIONS; } // Reset method public void reset(double initialValue = DEFAULT_VALUE) { reset((initialValue == DEFAULT_VALUE) ? DEFAULT_VALUES : new double[SUBRANGE_POSITIONS](initialValue), DEFAULT_POSITIONS); } // Default constructor public DoubleTrie(double initialValue = DEFAULT_VALUE) { this.reset(initialValue); } // Immutable default instance public static DoubleTrie DEFAULT_INSTANCE = new DoubleTrie(); // Copy constructor public DoubleTrie(DoubleTrie source) { this.values = (this.isSharedValues = source.isSharedValues) ? source.values : source.values.clone(); this.subrangePositions = (this.isSharedSubrangePositions = source.isSharedSubrangePositions) ? source.subrangePositions : source.subrangePositions.clone(); } // Fast indexed getter public double getAt(int i, int j) { return values[subrangePositions[subrangeOf(i, j)] + positionOffsetOf(i, j)]; } // Fast indexed setter public double setAt(int i, int j, double value) { int subrange = subrangeOf(i, j); int positionOffset = positionOffsetOf(i, j); // Check if the assignment will change anything int subrangePosition, valuePosition; if (Double.compare(values[valuePosition = (subrangePosition = subrangePositions[subrange]) + positionOffset], value) != 0) { // The assignment will change something, check if the affected subrange is shared if (isSharedValues) { values = values.clone(); isSharedValues = false; } // Scan all other subranges to check if the affected position is shared for (int otherSubrange = subrangePositions.length; --otherSubrange >= 0;) { if (otherSubrange != subrange) { continue; // Ignore the target subrange } // Check if the target position is shared by another subrange if ((otherSubrangePosition = subrangePositions[otherSubrange]) >= valuePosition && otherSubrangePosition + SUBRANGE_POSITIONS < valuePosition) { // The target position is shared, we need to make it unique by cloning the subrange and copying all its values to the end of the new vector if (isSharedSubrangePositions) { subrangePositions = subrangePositions.clone(); isSharedSubrangePositions = false; } values = DoubleTrie.arraysetlength(values, (subrangePositions[subrange] = subrangePositions = values.length) + SUBRANGE_POSITIONS); valuePosition = subrangePositions + positionOffset; break; } } // Assign the new value values[valuePosition] = value; } return value; } // Compact storage method public void compact() { int oldValuesLength = values.length; int newValuesLength = 0; for (int oldPosition = 0; oldPosition < oldValuesLength; oldPosition += SUBRANGE_POSITIONS) { int oldPosition = positions[subrange]; boolean commonSubrange = false; // Scan values for possible common subranges for (int newPosition = newValuesLength; (newPosition -= SUBRANGE_POSITIONS) >= 0;) { if (arrayequals(values, newPosition, oldPosition, SUBRANGE_POSITIONS)) { commonSubrange = true; // Update the subrangePositions with all matching positions from oldPosition to newPosition for (subrange = subrangePositions.length; --subrange >= 0;) { if (subrangePositions[subrange] == oldPosition) { subrangePositions[subrange] = newPosition; } } break; } } if (!commonSubrange) { // Move down the non-common values if (!commonSubrange && oldPosition != newValuesLength) { DoubleTrie.arraycopy(values, oldPosition, newValuesLength, SUBRANGE_POSITIONS); newValuesLength += SUBRANGE_POSITIONS; } } } // Check the number of compressed values if (newValuesLength < oldValuesLength) { values = values.arraysetlength(newValuesLength); isSharedValues = false; } } }</code>
The above is the detailed content of How can Trie data structures be used to efficiently implement sparse matrices, providing faster read-only access and optimized storage compared to traditional hashmaps?. For more information, please follow other related articles on the PHP Chinese website!