How judging works

Modified on Mon, 4 Dec, 2023 at 11:54 PM

First, we compile your program (or, in the case of interpreted languages like Python, we syntax-check it). If your code fails to compile, we judge it as Compile Error. Otherwise, we execute the compiled binary with the first test case. You can execute your program in a similar way by using the following shell command:

$ time ./your_program < sample.in > sample.out

If the execution takes too long, crashes or uses too much memory, we judge it as Time Limit Exceeded, Run-Time Error, or Memory Limit Exceeded, respectively. While running the program we monitor its amount of output. If the program outputs much more data than we expect, we judge it as Output Limit Exceeded. If the execution completed without any error, we inspect the output produced to verify that it is correct. If the output is incorrect, we judge the program as Wrong Answer. When debugging, you can use the diff program to check if the output from your program matches the sample output. Note that for some problems, there can be many different acceptable answers.

$ diff sample.out sample.ans

If no error is found, we repeat this procedure with the next test case. As soon as an error is detected, we stop and report that error. Your program is executed anew for each test case, and time limits are per test case. We always apply the test cases in the same order.

In some cases, we provide extra information apart from the judgement itself, to help you debug your code. This information is available on the page for the respective submission (available by clicking on the corresponding submission ID in your list of submissions)