Efficiently counting rooms from a floorplan (version 2)

This is version 2 of Efficiently counting rooms from a floorplan. I had accidentally pasted in the wrong version of the code.

Update

Final version (version 3) of the code with updated test harness is here: Multithreaded testing for counting rooms from a floor plan solution


I was inspired by Calculating the number of rooms in a 2D house and decided to see if I could come up with efficient code to solve the same problem.

To recap, the problem (from here) is this:

You are given a map of a building, and your task is to count the number of rooms. The size of the map is n×m squares, and each square is either floor or wall. You can walk left, right, up, and down through the floor squares.

Input

The first input line has two integers n and m: the height and width of the map.

Then there are n lines of m characters that describe the map. Each character is . (floor) or # (wall).

Output

Print one integer: the number of rooms.

Constraints

1n,m2500

Example

Input:

5 8
########
#..#...#
####.#.#
#..#...#
########

Output:

3

Strategy

It seemed to me to be possible to solve the problem by processing line at a time, so that’s what my code does. Specifically, it keeps a tracking std::vector<std::size_t> named tracker that corresponds to the rooms from the previous row and starts with all zeros.

As it reads each line of input, it processes the line character at a time. If it’s non-empty (that is, if it’s a wall), set the corresponding tracker entry to 0.

Otherwise, if the previous row (that is, the matching value from the tracker vector) was a room, then this is part of the same room.

If the previous character in the same row was a room, this is the same room.

The code also has provisions for recognizing that what it “thought” was two rooms turns out to be one room, and adjusts both the tracker vector and the overall roomcount.

Because I wanted to be able to test it with many different inputs, my version of the code keeps reading and processing each floor plan until it gets to the end of the file.

The code is time efficient because it makes only a single pass through the input, and it’s memory efficient because it only allocates a single 1×m vector.

Questions

  1. Correctness – The code works correctly on every input I’ve tried, but if there is any error in either the code or the algorithm, I’d like to know.
  2. Efficiency – Could the code be made even more efficient?
  3. Reusability – This works for a 2D map, but I’d like to adapt it to 3 or more dimensions. Are there things I could do in this code to make such adaptation simpler?

Any hints on style or any other aspect of the code would be welcome as well.

rooms.cpp

#include <iostream>
#include <vector>
#include <string>
#include <algorithm>

std::size_t rooms(std::istream &in) {
    std::size_t height;
    std::size_t width;
    std::size_t roomcount{0};
    static constexpr char empty{'.'};
    in >> height >> width;
    if (!in)
        return roomcount;
    std::vector<std::size_t> tracker(width, 0);
    for (auto i{height}; i; --i) {
        std::string row;
        in >> row;
        if (row.size() != width) {
            in.setstate(std::ios::failbit);
            return roomcount;
        } 
        for (std::size_t j{0}; j < width; ++j) {
            if (row[j] == empty) {
                // continuation from line above?
                if (tracker[j]) {
                    // also from left?
                    if (j && tracker[j-1]) {
                        if (tracker[j-1] < tracker[j]) {
                            tracker[j] = tracker[j-1];
                            --roomcount;
                        } else if (tracker[j] < tracker[j-1]) {
                            // set all contiguous areas to the left
                            for (auto k{j-1}; k; --k) {
                                if (tracker[k]) {
                                    tracker[k] = tracker[j];
                                } else {
                                    break;
                                }
                            }
                            --roomcount;
                        }
                    }
                } else {
                    // continuation from left?
                    if (j && tracker[j-1]) {
                        tracker[j] = tracker[j-1];
                    } else {
                        tracker[j] = ++roomcount;
                    }
                }
            } else {
                tracker[j] = 0;
            }
        }
    }
    return roomcount;
}


int main() {
    auto r = rooms(std::cin);
    while (std::cin) {
        std::cout << r << '\n';
        r = rooms(std::cin);
    }
}

test.in

5 8
########
#..#...#
####.#.#
#..#...#
########
9 25
#########################
#.#.#.#.#.#.#.#.#.#.#.#.#
#########################
#.#.#.#.#.#.#.#.#.#.#.#.#
#########################
#.#.#.#.#.#.#.#.#.#.#.#.#
#########################
#.#.#.#.#.#.#.#.#.#.#...#
#########################
3 3
...
...
...
3 3
###
...
###
3 3
###
###
###
7 9
#########
#.#.#.#.#
#.#.#.#.#
#.#...#.#
#.#####.#
#.......#
#########
5 8
########
#..#.#.#
##.#.#.#
#..#...#
########
7 8
########
#..#.#.#
##.#.#.#
#..#...#
########
#..#...#
########
7 9
#########
#.#.#.#.#
#.#.#.#.#
#.#.#.#.#
#.#.#.#.#
#.......#
#########
7 9
#########
#.#.##..#
#.#.##.##
#.#.##..#
#.#...#.#
#...#...#
#########
7 9
#########
#.#.....#
#.#.###.#
#.#...#.#
#.#####.#
#.......#
#########
7 9
#########
#.......#
#.#####.#
#.#.#.#.#
#.#.#.#.#
#.......#
#########

Results

Running the program as rooms <test.in produces this expected result:

3
47
1
1
0
2
2
4
1
1
1
1

Answer

This is called connected component analysis or labeling in the image processing literature. There are many algorithms that try to make this more efficient. You’re only counting regions, not assigning a unique label (i.e. room number) to each region, so you can get away with a somewhat simpler algorithm than I’m used to seeing.

You’ve already got comments on your coding style, I won’t go there again. But you also asked about a 3D extension, and I can give suggestions for that.

A straight-forward extension of your code to 3D would require to store the previous plane (2D data for the previous z coordinate) as well as the previous line. This gets complicated quickly, and will make it impossible to generate an algorithm that works with an arbitrary number of dimensions.

If you are indeed after an algorithm that would work with any number of dimensions, I’d suggest storing the whole floor plan and using a standard labeling algorithm. They’re only a little bit more complicated that what you’re doing, but assign a label to each room element. You’d have a Union-Find data structure to keep track of labels. For each new element you look at all connected elements (the neighbors in the grid) that have already been processed (this is a fixed list of offsets). If any of them has a label, assign the same label to this element. If another one also has a label, set the equivalence between the two labels in your Union-Find data structure.

The very efficient implementations optimize the order in which neighbors are examined, noting that certain neighbors need not be examined at all under certain configurations.

To learn more about these algorithms, look at YACCLAB. It is explicitly 2D, but you can see all the different approaches people have taken here. I wrote a dimensionality-independent version, you can see the code here.

Attribution
Source : Link , Question Author : Edward , Answer Author : Cris Luengo

Leave a Comment