Introduction

TAT-QA (Tabular And Textual dataset for Question Answering) is a large-scale QA dataset, aiming to stimulate progress of QA research over more complex and realistic tabular and textual data, especially those requiring numerical reasoning.

The unique features of TAT-QA include:

In total, TAT-QA contains 16,552 questions associated with 2,757 hybrid contexts from real-world financial reports.

The following is an example of TAT-QA. The left dashed line box shows a hybrid context. The rows with blue background are row header while the column with grey is column header. The right solid line box shows corresponding question, answer with its scale, and derivation to arrive at the answer.

TAT-QA Sample
For more information, please read our ACL 2021 paper [PDF].

TAT-DQA is a new large-scale Document Visual QA (VQA) dataset, which is constructed by extending the TAT-QA. Please check out it if you are interested in the new task.

Getting Started

Download a copy of the dataset in json format:
{
  "table": {                                                            # The tabular data in a hybrid context
    "uid": "3ffd9053-a45d-491c-957a-1b2fa0af0570",                      # The unique id of a table
    "table": [                                                          # The table content which is 2d-array
      [
        "",
        "2019",
        "2018",
        "2017"
      ],
      [
        "Fixed Price",
        "$  1,452.4",
        "$  1,146.2",
        "$  1,036.9"
      ],
      ...
    ]
  },
  "paragraphs": [                                                        # The textual data in a hybrid context comprising at least two associated paragraphs to the table
    {
      "uid": "f4ac7069-10a2-47e9-995c-3903293b3d47",                     # The unique id of a paragraph
      "order": 1,                                                        # The order of the paragraph in all associated paragraphs, starting from 1
      "text": "Sales by Contract Type: Substantially all of              # The content of the paragraph
       our contracts are fixed-price type contracts.
       Sales included in Other contract types represent cost
       plus and time and material type contracts."
    },
    ...
  ],
  "questions": [                                                         # The questions associated to the hybrid context
    {
      "uid": "eb787966-fa02-401f-bfaf-ccabf3828b23",                     # The unique id of a question
      "order": 2,                                                        # The order of the question in all questions, starting from 1
      "question": "What is the change in Other in 2019 from 2018?",      # The question itself
      "answer": -12.6,                                                   # The ground-truth answer
      "derivation": "44.1 - 56.7",                                       # The derivation that can be executed to arrive at the ground-truth answer
      "answer_type": "arithmetic",                                       # The answer type including `span`, `spans`, `arithmetic` and `counting`.
      "answer_from": "table-text",                                       # The source of the answer including `table`, `table` and `table-text`
      "rel_paragraphs": [                                                # The orders of the paragraphs that are relied to infer the answer if any.
        "2"
      ],
      "req_comparison": false,                                           # A flag indicating if `comparison/sorting` is needed to answer the question whose answer is a single span or multiple spans
      "scale": "million"                                                 # The scale of the answer including `None`, `thousand`, `million`, `billion` and `percent`
    }
  ]
}

Leaderboard

Rank Model Name Team Name Exact Match F1 Created Paper Codes
- Human Performance - 84.1 90.8 - - -
1 TAT-LLM (70B) NExT 81.4 88.4 20 Jan 2024 Paper N.A.
2 TAT-LLM (13B) NExT 77.5 85.9 20 Jan 2024 Paper N.A.
3 Code Generation for Table-Text Question using LLM (70B) Anonymous 76.8 84.7 21 Sep 2023 N.A. N.A.
4 TAT-LLM (7B) NExT 76.4 85.1 20 Jan 2024 Paper N.A.
5 AeNER: Attention-enhanced Numerical Embeddings for Reasoning Gryffindor 75.0 83.2 16 May 2022 N.A. N.A.
6 Code Generation for Table-Text Question using LLM (13B) Anonymous 73.7 81.8 21 Sep 2023 N.A. N.A.
7 Encore HIT-SCIR 71.8 80.1 24 Oct 2022 Paper Code
8 KFEX-N: A Table-Text QA Model with Knowledge-Fused Encoder & EX-N Tree Decoder CWQian China 71.0 79.5 29 Oct 2023 Paper N.A.
9 MVGE: Multi-View Graph Encoder for Answering Hybrid Numerical Reasoning Question Weiyifan@CASIA 70.9 79.1 23 Dec 2022 Paper Code
10 RegHNT: Relational graph neural network with special multitask decoder LFyimi@CASIA China 70.3 77.9 5 May 2022 Paper Code
11 Code Generation for Table-Text Question using LLM (7B) Anonymous 68.4 77.3 21 Sep 2023 N.A. N.A.
12 UniRPG: Unified Discrete Reasoning over Table and Text as Program Generation JD AI Research 67.2 76.0 24 Feb 2022 Paper Code
13 RSTQA: Rounds Specified numerical reasoning for Table-Text QA NLP2CT@UM Macau 66.8 75.0 30 Oct 2023 N.A. N.A.
14 SoarGraph: Semantic-Oriented Hierarchical Graphs NExT 65.4 75.3 8 Sep 2022 Paper N.A.
15 UniPCQA NExT / CUHK 63.9 72.2 22 Oct 2022 Paper N.A.
16 MHST: Multi-Head with Sequence to Expression Tree NExT 63.6 72.7 24 May 2022 Paper N.A.
17 GANO: GNN for Tabular and Textual QA with Numerical Reasoning iLab@AIST Japan 61.9 72.1 15 Jul 2022 Paper N.A.
18 TBC Anonymous 60.8 68.7 22 Jun 2022 N.A. N.A.
19 FinMath: Injecting a Tree-structured Solver for Question Answering over Financial Reports FinMath@NEU China 58.3 68.2 6 Aug 2022 Paper N.A.
20 KIQA: Knowledge-infused QA Model for Table and Text iLab@AIST Japan 58.2 67.4 23 Feb 2022 Paper N.A.
21 LETTER: Logic Enhanced Table-Text Reasoning OnceAgain 56.1 64.3 17 Feb 2022 N.A. N.A.
22 TeaBReaC-pretrained T5-3B SBU / Allen AI 55.8 63.8 17 Jun 2022 Paper N.A.
23 OPERA-H Hero_Dirk 55.2 63.8 9 Oct 2022 N.A. N.A.
24 GenQA:Generative model for QA from table and text IITJ@India 55.1 65.6 7 Feb 2023 N.A. N.A.
25 Baseline - TagOp NExT 50.1 58.0 13 May 2021 Paper Code

Submission

To evaluate your models, we have also made available the evaluation script we will use for official evaluation, To run the evaluation, use

python tatqa_eval.py --gold_path=:path_to_dev --pred_path=:path_to_predictions

Predictions Format

The predictions file in JSON format contains a dictionary with question ids as keys and the predictions as values (each prediction shall include both `answer` and `scale` in an array). For example,

{
 "9337c3e6-c53f-45a9-836a-02c474ceac16": [
    "4.6",
    "percent"
  ],
  "c4170232-e89c-487a-97c5-afad45e9d702": [
    "16",
    "thousand"
  ],
  "d81d1ae7-363c-4b47-8eea-1906fef33856": [
    ["2018", "2019"],
    ""
  ]
  ...
}

We also provide a sample prediction file (on Dev) for your reference.

python tatqa_eval.py --gold_path=dataset_raw/tatqa_dataset_dev.json --pred_path=sample_prediction.json

Submission

Please email the prediction file of the test set with the following information to us:

Please give us up to two weeks to evaluate your submission and we will add your model to the leaderboard.

Contact

The TAT-QA dataset is under the license of Creative Commons (CC BY) Attribution 4.0 International.

For more information, please contact the author: Fengbin ZHU fengbinzhu@u.nus.edu

Please kindly cite our work if you use our dataset or codes, thank you.


@inproceedings{zhu2021tat,
    title = "{TAT}-{QA}: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance",
    author = "Zhu, Fengbin  and
      Lei, Wenqiang  and
      Huang, Youcheng  and
      Wang, Chao  and
      Zhang, Shuo  and
      Lv, Jiancheng  and
      Feng, Fuli  and
      Chua, Tat-Seng",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-long.254",
    doi = "10.18653/v1/2021.acl-long.254",
    pages = "3277--3287"
}