## What is the computer science definition of entropy?

I’ve recently started a course on data compression at my university. However, I find the use of the term “entropy” as it applies to computer science rather ambiguous. As far as I can tell, it roughly translates to the “randomness” of a system or structure. What is the proper definition of computer science “entropy”? Answer … Read more

## Computing (near) optimal displacement tables

Suppose we have a two-dimensional table T with r rows and c columns that is sparse. Let T[i][j] be the element at the ith row and jth column of T, with zero-based indexing. We can compress T[i][j] by putting all elements in a single 1D array A, and letting A[D[i]+j]=T[i][j]. Now with D[i]=ci we have … Read more

## Implementing binary arithmetic encoder

I have set integers that are normally distributed, and I’m trying to implement binary arithmetic encoding, as suggested in this post: Compressing normally distributed data I’m running into trouble because the user remarks that we should try arithmetic encoding the quotient using a sequence of binary symbols. I don’t understand what this means because I’ve … Read more

What are the advantages and drawbacks of considering ever longer block lengths or context lengths, if one was to work with estimated probabilities(measuring on the fly: aka “adaptive”) rather than probability distribution that was measured offline(static). I have thought of the following advantages Lesser compression as even though the context is taken into account the … Read more

## Mathematical limits on lossless data compression

Let’s say Bob wants to send a particular binary sequence to Alice. Imagine that Bob and Alice both have powerful machines but slow Internet connections. Bob could just send the sequence directly but the upload and the download would take a lot of time. Instead Bob could send a program that outputs the sequence. Assume … Read more

## How do I compress data vectors in broadcast messages?

Let us model a wireless broadcast network as an undirected graph G(V,E) where there is an edge between every pair of nodes i,j∈V if they are in transmission range of each other. wi→j is the weight of the link i→j which can be calculated only by transmitter i and is unique to receiver j. All … Read more

## LZW with dictionary clearing

How does LZW decompress data with dictionary clearing/flushing? I understand that a space is reserved in the dictionary that represents a clear code (usually 256), but how is this code actually used when compressing and decompressing data? My thoughts for compression are that it checks the table size, and if the table size has reached … Read more

## Best compression algorithm for CNF SAT instances in DIMACS

For a CNF SAT instance in the DIMACS format what is the best algorithm to compress it? What is the best algorithm for 3-SAT instances in particular? In 2020 SAT competition used .xz which if I understand correctly relies on the LZMA algorithm. Brotli probably would not be optimal as it has a pre-defined dictionary … Read more

## Theoretically the most efficient compression algorithm

I remember in high school we did this thing where we were given a bunch of numbers in a row that had a pattern. We were asked to work out what the next number in the pattern would be. It was very easy stuff and they taught as a formula that i can’t remember. Basically … Read more

## Do the two huffman trees have the same corpus?

Consider the following Huffman trees: I was asked if those trees can have the same corpus. My answer was no, based on these calculations: For the right tree: a1≤a2 a1+a2≤a5 a3≤a4 a1+a2+a5≤a3+a4 For the left tree: a1≤a2 a3≤a4 a1+a2+a3+a4≤a5 Adding the last equations from each tree we have that: 2a1+2a2≤0 Which is a contradiction because … Read more