Competitive Programming

Main points for today

Updates

Another competitive programming book

In addition to Competitive Programming 3, programming master Antti Laaksonen has recently published the Competitive Programmer's Handbook, a free 296-page ebook (more info at his blog post), as well as an expanded version, the Guide to Competitive Programming (ebook $39.99, softcover $49.99).

More sites with practice problems

Upcoming competitions

Problems for this week

Gennady and a Card Game

Simplified problem description

Six standard playing cards are given. Each card is two characters: the first is the rank in {'2', '3', '4', '5', '6', '7', '8', '9', 'T', 'J', 'Q', 'K', 'A'} and the second is the suit in {'D', 'C', 'S', 'H'}.

If any of the last five cards have the same rank or suit as the first card, print "YES". Otherwise, print "NO".

Solutions

Since the algorithm for this one is pretty straightforward (see the Codeforces editorial), I've implemented it in a few different languages to demonstrate the difference in runtimes that can become very significant for more complex problems:

C++: 1097a.cpp – this solution runs in 0.28 ms on my system and 30 ms (which seems to be the minimum with overhead) in Codeforces' environment.

#include <iostream>
#include <string>

int main() {
    std::string tableCard;
    std::cin >> tableCard;
    for(int i = 0; i < 5; i++) {
        std::string card;
        std::cin >> card;
        if(card[0] == tableCard[0] || card[1] == tableCard[1]) {
            std::cout << "YES\n";
            return 0;
        }
    }
    std::cout << "NO\n";
    return 0;
}

C: 1097a.c – this solution runs in 0.48 ms on my system and 30 ms in Codeforces' environment.

#include <stdio.h>

int main() {
    char tableCard[2];
    scanf("%2s", tableCard);
    char card[2];
    for(int i = 0; i < 5; i++) {
        scanf("%2s", card);
        if(card[0] == tableCard[0] || card[1] == tableCard[1]) {
            puts("YES");
            return 0;
        }
    }
    puts("NO");
    return 0;
}

Python 3: 1097a.py – this solution runs in 12.3 ms on my system and 109 ms in Codeforces' environment.

tableCard = input()
cards = input().split(' ')
for card in cards:
    if card[0] == tableCard[0] or card[1] == tableCard[1]:
        print("YES")
        exit()
print("NO")

Java: Cardgame.java – this solution runs in 80.6 ms on my system and 124 ms in Codeforces' environment.

import java.util.Scanner;

public class Cardgame {
    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);
        String tableCard = scanner.nextLine();
        String[] myCards = scanner.nextLine().split(" ");
        for(String card : myCards) {
            if(card.charAt(0) == tableCard.charAt(0) || card.charAt(1) == tableCard.charAt(1)) {
                System.out.println("YES");
                scanner.close();
                System.exit(0);
            }
        }
        System.out.println("NO");
        scanner.close();
        System.exit(0);
    }
}

Brainfuck: This solution (annotated version: 1097a.bf) runs in 0.26 ms on my system (compiled with bf-x86) and is included here as evidence that C and C++ are not necessarily the fastest languages for a given problem. (Brainfuck lacks the overhead C and C++'s string implementations add in processing standard input.) If there's a language better suited to the problem that is supported by the contest, feel free use it! Sadly, Codeforces does not support Brainfuck—but CodeChef does!

>,>,<<+++++>>>>>>>>>>+++++[>+++++++++++++<-]>[>+>+>+>+>+<<<<<-]+>+++++++++++++++ +++++++++>++++>++++++++++++++++++>+++++++++++++>++++++++++++++>++++++++++<<<<<<< <<<<<<<<<<[>>>[-]>[-]<<<[>>+>+<<<-]>>>[<<<+>>>-]<<[>>+>+<<<-]>>>[<<<+>>>-],,>,<[ <<->>-]>>+<<<<[>>>[<<->>-]>>>+<<<<<[>>>>>-<<<<<[>>>>>>+<<<<<<-]]>>>>>>[<<<<<<+>> >>>>-]<[>>>.>.>.>>>.<<<<<<-<<<<<<<<<<<[-]>>>>>>>>>-]<<-<<<<[>>>>>+<<<<<-]]>>>>>[ <<<<<+>>>>>-]<[>>>>>.>.>.>>>.<<<<<<-<<<<<<<<<<<[-]>>>>>>>-]<<<<<<<-]>>>>>>>>>>>[ >>>>.>.>.<<<<<<-]

An object-oriented solution

Conventionally, better style would be to write something like the following (1097a-oo.cpp):

#include <iostream>

class Card {
public:
    bool operator==(const Card& c) {
        return rank == c.rank || suit == c.suit;
    }
    friend std::istream& operator>>(std::istream& in, Card& c) {
        in >> c.rank >> c.suit;
    }
private:
    char rank, suit;
};

int main() {
    Card tableCard;
    std::cin >> tableCard;
    for(int i = 0; i < 5; i++) {
        Card card;
        std::cin >> card;
        if(card == tableCard) {
            std::cout << "YES\n";
            return 0;
        }
    }
    std::cout << "NO\n";amp;
    return 0;
}

But this is not good for competitive programming! It takes longer to write, there is more that can go wrong, and it takes about 28% longer to run (0.36 ms on my system)! Remember, you aren't scored on style—only on how long it takes you to submit a solution that runs within the time and memory limits.

Zuhair and Strings

Brute force algorithm

The strings are guaranteed to match /^[a-z]+$/, so there are only 26 possible characters that could yield the largest x. Furthermore, the string can be a maximum of 200,000 characters in length. This means that there are at most 5,200,000 possible combinations of character and starting position for a substring.

Perhaps a brute-force algorithm like this (1105b-brute.cpp) will be good enough:

#include <iostream>
#include <string>

using std::cin;
using std::cout;
using std::string;

int main() {
    int n, k;
    string s;
    cin >> n >> k >> s;
    int level = 0;
    for(char c = 'a'; c <= 'z'; c++) {
        string sub = "";
        for(int x = 0; x < k; x++) {
            sub += c;
        }
        int charLevel = 0;
		int start = 0;
        while(start <= n - k * (level + 1) + 1) {
            if(s.substr(start, k) == sub) {
                start += k;
                charLevel++;
            }
            else {
                start++;
            }
        }
        if(charLevel > level) {
            level = charLevel;
        }
    }
    cout << level << '\n';
}

Unfortunately, when given a particularly troubling input (generated with strings.py), this program takes 8.1 seconds to complete on my system—well over the 1-second time limit.

Using regular expressions for testing

Why don't we match the string against a regular expression that tells us whether its level for each character is less than or equal to i, decreasing i until we get a positive result? For example, /(b{3}.*){5}/ will match a string that has level 5 for letter b if k = 3.

Since Python's regular expression library is easier to use (in my opinion) than C++'s, I implemented this in Python (1105b-bad.py):

import re
from string import ascii_lowercase

nk = input().strip().split(' ')
n = int(nk[0])
k = int(nk[1])
s = input().strip()

maxLevel = 0
for c in ascii_lowercase:
    x = n // k # Maximum level based on n and k
    while x > maxLevel and not re.match('({0}{{{1}}}.{{0,{2}}}){{{3}}}'.format(c, k, n // x - k, x), s):
        # Maximum of n // x - k characters after each run of k characters c
        x -= 1
    maxLevel = x

print(maxLevel)

But regular expressions are slow. For the same input file as before, this script completes in just 57 ms on my system, but for another troubling input (generated with strings2.py) it takes 3 minutes, 8 seconds—over 22 times slower than the first brute force algorithm!

Using regular expressions for counting

But with regular expressions we can not only match a string against an expression, but count the matches—which is what we want to do! Python's re.findall will give us a list of all the matches—the length of this list is the level for a given character!

import re
from string import ascii_lowercase

nk = input().strip().split(' ')
n = int(nk[0])
k = int(nk[1])
s = input().strip()

maxLevel = 0
for c in ascii_lowercase:
    charLevel = len(re.findall('{0}{{{1}}}'.format(c, k), s))
    maxLevel = max(maxLevel, charLevel)

print(maxLevel)

This program (1105b.py) runs in 3.1 seconds for the second troubling input file on my system. Close—maybe if I reimplement it in C++ (1105b-regex.cpp) it will be sufficient:

#define _GLIBCXX_REGEX_STATE_LIMIT 200000 // To increase the regex state limit imposed by g++

#include <iostream>
#include <regex>
#include <sstream>

int main() {
    int n, k;
    std::string s;
    std::cin >> n >> k >> s;

    int maxLevel = 0;
    for(char c = 'a'; c <= 'z'; c++) {
        std::ostringstream exprStream;
        exprStream << c << '{' << k << '}';
        std::regex expr(exprStream.str());
        int charLevel = std::distance(std::sregex_iterator(s.begin(), s.end(), expr), std::sregex_iterator());
        if(charLevel > maxLevel) {
            maxLevel = charLevel;
        }
    }

    std::cout << maxLevel << std::endl;
}

This completes the second troubling input file in just 150 ms on my system, but segfaults when given the new input, apparently due to a long-standing bug in libstdc++. (I submitted it as a Visual C++ solution on Codeforces in case Microsoft's C++ implementation is any better, but it ran out of time on an even simpler test than the Python one did.)

Solution: Processing a character at a time

So I finally caved and implemented the checks for matching runs myself. Moreover, instead of calculating the level for each letter separately, I can now calculate all their levels as I go. Here is the resulting code (1105b.cpp):

#include <iostream>
#include <algorithm>

int main() {
    int n, k;
    std::cin >> n >> k;
    char last = ' '; // a non-letter
    int run = 0;
    int levels[26] = {0};

    for(int i = 0; i < n; i++) {
        char c;
        std::cin >> c;
        if(c == last) {
            run++;
            if(run == k) {
                levels[c - 'a']++;
                run = 0;
                last = ' ';
            }
        }
        else if(k == 1) {
            levels[c - 'a']++;
        }
        else {
            last = c;
            run = 1;
        }
    }

    std::cout << *std::max_element(levels, levels + 26) << std::endl;
}

This solution runs in 8.2 ms on my system and 62 ms in Codeforces' environment.

This is the intended solution, per the editorial.

Pens and Days of Week

Simplified problem description

Provided is an array a of n integers.
Every day x (starting from day 0), a[x % n] -= 1, except when x % 7 == 6.
What is the index of the first value in a to reach 0?
1 ≤ n ≤ 50,000; 1 ≤ a[i] ≤ 10⁹
Input format: n on first line, then space-separated a[i] on second line.
Output format: The index of the solution plus 1 (since the problem uses a 1-indexed array of pens).
Time limit: 3 seconds; memory limit: 256 MB.

Naïve algorithm

At first, this problem may appear easy. It is tempting to write something like the following (pens-naive.cpp):

#include <iostream>
#include <vector>

int main() {
    int n;
    std::vector<int> a;
    std::cin >> n;
    for(int i = 0; i < n; i++) {
        int val;
        std::cin >> val;
        a.push_back(val);
    }
    int x = 0;
    while(true) {
        if(x % 7 != 6) a[x % n]--;
        if(a[x % n] == 0) {
            std::cout << x % n + 1 << std::endl;
            break;
        }
        x++;
    }
    return 0;
}

This isn't good enough. Though this program should always produce the correct answer, it is not nearly efficient enough to process the maximum input. For the two sample cases, it completes in 2 ms on my system—pretty good, right?—but when given the maximal input file (50,000 pens each with 1,000,000,000 ml of ink; input file generated with pens.py), it segfaulted because x did not have enough bytes. After changing x to an unsigned long long int and adding some signal handling (see: pens-naive-modified.cpp), I estimated based on the state after one hour that this algorithm would complete on my system in 10 days, 13 hours, and 18 minutes.

That is more than three seconds.

Analyzing the problem

At this point, I made tables to show the state of the array in the naïve solution after each day for each sample input:

Sample 1Pen 1Pen 2Pen 3
a[0]a[1]a[2]
Start333
After day 0233
After day 1223
After day 2222
After day 3122
After day 4112
After day 5111
After day 6111
After day 7101

In these tables, the value that is decremented on each day is highlighted; days on which this value is not decremented (Sundays) are listed in bold in the left column. The first value equal to zero is bolded. Days are zero-indexed as in the above implementation despite being one-indexed in the problem statement; both indices are presented for the pen numbers in the header rows.

Sample 2Pen 1Pen 2Pen 3Pen 4Pen 5
a[0]a[1]a[2]a[3]a[4]
Start54544
After day 044544
After day 143544
After day 243444
After day 343434
After day 443433
After day 533433
After day 633433
After day 733333
After day 833323
After day 933322
After day 1023322
After day 1122322
After day 1222222
After day 1322222
After day 1422221
After day 1512221
After day 1611221
After day 1711121
After day 1811111
After day 1911110

Tip: Always try to calculate the sample inputs on paper to make sure you understand the problem correctly! You may even stumble into an efficient solution in the process. I spent some time at last week's meeting trying to figure out why my program wasn't working before realizing my algorithm was based on an incorrect interpretation of the problem statement.

After completing the above tables, I noticed that the pattern in the ink values was more obvious for each pen than for each day. Take pen 4 in sample 2, for instance. It starts with 4 ml of ink. This is then decremented on day 5n + 3 (n ∈ Z), except when day % 7 == 6. The following algorithm (pens-vertical.cpp) naturally arises from this observation:

#include <iostream>
#include <climits>

int main() {
    int n;
    unsigned long long int minDay = ULLONG_MAX;
    int minPen = 0;
    std::cin >> n;
    for(int i = 0; i < n; i++) {
        int ink;
        std::cin >> ink;

        unsigned long long int day = i;
        for(;;) {
            if(day % 7 == 6) {
                day += n;
                if(day % 7 == 6) {
                    break; // pen is never used
                }
            }
            ink--;
            if(day >= minDay) {
                break; // pen is not the first to run out of ink
            }
            if(!ink) {
                minDay = day;
                minPen = i;
                break; // pen ran out of ink
            }
            day += n;
        }
    }
    std::cout << minPen + 1 << '\n';
    return 0;
}

Guess what? This isn't good enough either. If it is billions of days before a pen runs out of ink, there are still millions of calculations to be done on each pen. I again modified this program to report the state after one hour and used this data to estimate that this program would complete in 22 hours and 15 minutes on my system—much faster than the first algorithm, but still more than 3 seconds.

This algorithm certainly feels nicer than the first one (to me, anyway)—the calculations it's doing are much more straightforward, but there are still too many of them.

How can we determine the day on which each pen runs out of ink mathematically rather than with brute force?

The solution

What if we combined days into blocks such that each pen's ink is decreased by the same amount in each row of the table? In order to see the benefit of this, we'll need to come up with a new example. The table below corresponds to the following input:

5
14 14 14 14 13
Sample 2Pen 1Pen 2Pen 3Pen 4Pen 5
a[0]a[1]a[2]a[3]a[4]
Start1414141413
Days 0–34
After day 01314141413
After day 11313141413
After day 21313131413
After day 31313131313
After day 41313131312
After day 51213131312
After day 61213131312
After day 71213121312
After day 81213121212
After day 91213121211
After day 101113121211
After day 111112121211
After day 121112111211
After day 131112111211
After day 141112111210
After day 151012111210
After day 161011111210
After day 171011101210
After day 181011101110
After day 19101110119
After day 20101110119
After day 21101010119
After day 2210109119
After day 2310109109
After day 2410109108
After day 259109108
After day 26999108
After day 27999108
After day 2899998
After day 2999997
After day 3089997
After day 3188997
After day 3288897
After day 3388887
After day 3488887
Continued on right

This could be displayed as a very wide table, each row consisting of one block of days, but it wouldn't fit on this page.

Sample 2
(continued)
Pen 1Pen 2Pen 3Pen 4Pen 5
a[0]a[1]a[2]a[3]a[4]
After day 3488887
Days 35–69
After day 3578887
After day 3677887
After day 3777787
After day 3877777
After day 3977776
After day 4067776
After day 4167776
After day 4267676
After day 4367666
After day 4467665
After day 4557665
After day 4656665
After day 4756565
After day 4856565
After day 4956564
After day 5046564
After day 5145564
After day 5245464
After day 5345454
After day 5445453
After day 5545453
After day 5644453
After day 5744353
After day 5844343
After day 5944342
After day 6034342
After day 6133342
After day 6233342
After day 6333332
After day 6433331
After day 6523331
After day 6622331
After day 6722231
After day 6822221
After day 6922221
Days 70–104
After day 7012221
After day 7111221
After day 7211121
After day 7311111
After day 7411110

In this example, after each block of 35 days, each pen's ink level is decreased by 6. Why? Because the width of the block is 7 * n, so each day of the week except Sunday sees each pen decremented once during the block. The result will always be 6 if n % 7 != 0.

The other case is of course when n % 7 == 0. In this case, the pattern becomes much simpler: some pens (those with indices i % 7 == 6) are never decremented, and the others are decremented once per block.

The length of these blocks (lcm(7, n) days) turns out to be irrelevant—what matters is now how many blocks, not how many days, it takes for each pen to run out of ink. This calculation is simple!

int endBlocks[n] = {0}; // The block in which pen n runs out of ink.
for(int i = 0; i < n; i++) {
    if(i % 7 == 6) endBlocks[n] = -1; // The pen never runs out of ink.
    else endBlocks[n] = a[n] / 6 + !!(a[n] % 6); // A safe implementation of ceil(a[n] / 6)
}

But how do we know which pen runs out of ink first if two pens run out in the same block? It's not guaranteed to be the pen with the lowest index that runs out during this block, since that pen could be used on Sunday before another pen runs out in the following week within the same block. Because blocks always end on Sunday, it is sufficient for our needs to simply skip to the beginning of the first block in which a pen runs out of ink, then run the naïve algorithm (pens.cpp):

#include <iostream>
#include <vector>
#include <climits>

int main() {
    int n;
    std::cin >> n;
    int ink[n];
    int minBlock = INT_MAX;
    for(int i = 0; i < n; i++) {
        std::cin >> ink[i];
        int endBlock = -1; // -1 if this pen is never depleted
        if(n % 7) {
            endBlock = ink[i] / 6 + !!(ink[i] % 6) - 1; // Block before this pen runs out of ink
        }
        else {
            if(i % 7 != 6) {
                endBlock = ink[i] - 1;
            }
        }
        if(endBlock != -1 && endBlock < minBlock) {
            minBlock = endBlock;
        }
    }

    // Determine the state at the end of the minBlock.
    for(int i = 0; i < n; i++) {
        if(n % 7) {
            ink[i] -= minBlock * 6;
        }
        else {
            if(i % 7 != 6) {
                ink[i] -= minBlock;
            }
        }
    }

    // Calculate from the beginning of the minBlock.
    int minDay = INT_MAX;
    int minPen = 0;
    for(int i = 0; i < n; i++) {
        int a = ink[i];
        int day = i;
        for(;;) {
            if(day % 7 == 6) {
                day += n;
                if(day % 7 == 6) {
                    break; // pen is never used
                }
            }
            a--;
            if(day >= minDay) {
                break; // pen is not the first to run out of ink
            }
            if(!a) {
                minDay = day;
                minPen = i;
                break; // pen ran out of ink
            }
            day += n;
        }
    }
    std::cout << minPen + 1 << '\n';
    return 0;
}

This solution runs in 27 ms on my machine and 124 ms in Codeforces' environment.

It would also be acceptable to use the first naïve algorithm for the final stage (pens-using-naive.cpp). This solution takes almost exactly the same amount of time since there is very little to be done in this final loop and is the intended solution per the Codeforces editorial (in Russian).

Of course, we could also find a more efficient algorithm to determine which pen runs out first in the final block. But we do not need to.