I’m looking for a data structure that stores a set of strings over a character set Σ, capable of performing the following operations. We denote D(S) as the data structure storing the set of strings S.

`Add-Prefix-Set`

on D(S): given some set T of (possibly empty) strings, whose size is bounded by a constant and whose string lengths are bounded by a constant, return D({ts | t∈T,s∈S}). Both these bounding constants are global: they are the same for all inputs T.`Get-Prefixes`

on D(S): return {a | as∈S,a∈Σ}. Note that I don’t really mind what structure is used for this set, as long as I can enumerate its contents in O(|Σ|) time.`Remove-Prefixes`

on D(S): return D({s | as∈S,a∈Σ}).`Merge`

: given D(S) and D(T), return D(S∪T).Now, I’d really like to do all these operations in O(1) time, but I’m fine with a structure that does all these operations in o(n) time, where n is the length of the longest string in the structure. In the case of the merge, I’d like a o(n1+n2) running time, where n1 is n for the first and n2 the n for the second structure.

An additional requirement is that the structure is immutable, or at least that the above operations return ‘new’ structures such that pointers to the old ones still function as before.

A note about amortization: that is fine, but you have to watch out for persistence. As I re-use old structures all the time, I’ll be in trouble if I hit a worst case with some particular set of operations on the same structure (so ignoring the new structures it creates).

I’d like to use such a structure in a parsing algorithm I’m working on; the above structure would hold the lookahead I need for the algorithm.

I’ve already considered using a trie, but the main problem is that I don’t know how to merge tries efficiently. If the set of strings for

`Add-Prefix-Set`

consists of only single-character strings, then you could store these sets in a stack, which would give you O(1) running times for the first three operations. However, this approach doesn’t work for merging either.Finally, note that I’m not interested in factors |Σ|: this is constant for all I care.

**Answer**

I thought for quite some time, but didn’t find the problem with doing all of your operations in the most stupid possible way in a trie-like DAG structure:

# Add-Prefix-Set

Create a trie of strings from T. Connect each leaf node to the root of the old trie.

Complexity: O(|T|)

# Merge

Unite roots of two structures: make all children nodes of the second root children of the first node. You may now have multiple edges marked with the same character going from the same node.

Complexity: O(1)

# Lazy update of the root

- For each character unite all children of the root accessible by the edges marked with this character. (O(|Σ|) + Amortized O(1) for each edge ever added)
- Delete all edges going into the root from the top. (Can happen after Remove-prefixes, amortized O(1) for each edge ever added)

# Get-prefixes

Lazy update the root. Now find all children of the root and report set of letters on edges going to them.

Complexity: O(|Σ|)

# Remove-prefixes

Lazy update the root. Unite all children of the root and set the root pointer to the result of this unite. Lazy update the new root.

Complexity: O(|Σ|) + Lazy update

# Persistence

All information about this data structure can be saved in arrays, with O(1) information per node plus O(|Σ|) information for each child of the node. Each operation can be held out as set of assignments in these arrays. Use persistent range tree or any other data structure that can be viewed as a persistent array. Additional multiplier for memory and space complexity O(logN), where N is the total number of characters in all queries.

**Attribution***Source : Link , Question Author : Alex ten Brink , Answer Author : Community*