TAT-HQA: Hypothetical QA over Tabular&Textual Content

Introduction

TAT-HQA (Tabular And Textual dataset for Hypothetical Question Answering) is a numerical QA dataset with hypothetical questions, i.e. questions containing an assumption beyond the original context. TAT-HQA aims to enable QA model the counterfactual thinking ability to imagine and reason over unseen cases based on the seen facts and counterfactual assumptions.

TAT-HQA has the following features in addition to the features of TAT-QA:

The question is modified from TAT-QA by adding an ssumption which describes a counterfactual situation (e.g. “if the amount in 2019 was $132,935 thousand instead). Each hypothetical question can find one original question from TAT-QA with the same tabular and textual context.
The assumption may involve various discrete computation, e.g. addition, multiplication, percentage increase/decrease. During annotation, the calculation involved and the phrasing of the assumption are not restricted to certain types.

In total, TAT-HQA contains 8,283 questions associated with 2,758 hybrid contexts from real-world financial reports.

The following is an example of TAT-HQA along with its original TAT-QA example. The left box shows a factual tabular and textual context, where a normal question (the upper middle box) is asked. Based on the normal question, a hypothetical question is asked with an additional assumption "if the amount in 2019 was $132,935 instead", leading to a different answer (-30,000 vs. -29,253). The blue dashed boxes indicate the position that the assumption affects. The QA model is supposed to imagine a counterfactual context (the right dashed box) and reason over this imagined situation.

For more information, please read our ACL 2022 paper [PDF].

Getting Started

Download a copy of the dataset from our Github Repo. Please refer to a detailed dataset file discription in the Repo.
The training and validation data of a mixture of TAT-QA and TAT-HQA is stored in dataset_raw with the following format. The data format is nearly identical to TAT-QA, with one additional key for hypothetical questions "rel_question" to mark the corresponding original question under the same table.
To implement the paper method, i.e. the Learning-to-Imagine module, we generate some extra fields for TAT-HQA, such as the operator and the operands regarding the assumption. The data with extra fields is stored in dataset_extra_field, where TAT-QA and TAT-HQA are splitted and stored in two folders.
The testing data of TAT-HQA is stored in dataset_test_hqa/tathqa_dataset_dev.json. This json file contains no answer and is used for prediction to the leaderboard.

{
  "table": {                                                          
    "uid": "a3e9cad512b8d3ff0cd6e50774007eeb",                        
    "table": [                                                         
      [
        "(In thousands of $)",
        "2019",
        "2018",
      ],
      [
        "Net Debt Receipts",
        "$  243,513",
        "$   30,300",
      ],
      ...
    ]
  },
  "paragraphs": [                                                        
    {
      "uid": "76573d23233ebdfc3c89609c6372e951",                    
      "order": 1,                                                        
      "text": "The most significant impacts of Hilli LLC VIE's 
               operations on our consolidated statements of income and 
               consolidated statements of cash flows, as of December 31,
               2019 and 2018, are as follows: ..."
    },
    ...
  ],
  "questions": [                                         # Examples of 1) an original question and 2) a hypothetical question.                                         
    {
      "uid": "b68356a11ee8d39571d44b087b1558c7",                                
      "order": 1,                                                        
      "question": "What was the change in net debt receipts between 2018 and 2019?",     
      "answer": 129,454,                                                  
      "derivation": "129,454 - 0",                                      
      "answer_type": "arithmetic",                       # The answer type including `span`, `spans`, `arithmetic`, `counting` and `counterfactual`.
      "answer_from": "table",                            # The source of the answer including `table`, `table` and `table-text`
      "rel_paragraphs": [                                                
      ],
      "req_comparison": false,                                           
      "scale": "thousand"                                # The scale of the answer including `None`, `thousand`, `million`, `billion` and `percent`
    }
    {
      "uid": "e982f2be72b37a222d61fe645df00168",                     
      "order": 2,                                                        
      "question": "What would be the change in net debt receipts between 2018 and 2019 if the amount in 2018 was 100,250 thousand instead?",     
      "answer": 29,204,                                                  
      "derivation": "129,454 - 100,250",                                      
      "answer_type": "arithmetic",                                      
      "answer_from": "table",                                           
      "rel_paragraphs": [                                                
      ],
      "req_comparison": false,                                           
      "scale": "thousand"                                                
      "rel_question": 1                                  # The order of the corresponding original question under the same table. 
    }
  ]
}

Leaderboard

Rank	Model Name	Team Name	Exact Match	F1	Created
-	Human Performance	-	84.1	90.8	-
1	RSTQA: Rounds Specified numerical reasoning for Table-Text QA	NLP2CT	57.6	58.2	14 Dec 2023
2	TagOp - L2I	NExT	54.4 +- 1.0	54.7 +- 1.0	13 May 2022

Submission

To evaluate your models, we have also made available the official evaluation script, To run the evaluation, use

python evaluate.py gold_data_path.json prediction_result_path.json 0

Predictions Format

The predictions file in JSON format contains a dictionary with question ids as keys and the predictions as values (each prediction shall include both `answer` and `scale` in an array). For example,

{
 "9337c3e6-c53f-45a9-836a-02c474ceac16": [
    "4.6",
    "percent"
  ],
  "c4170232-e89c-487a-97c5-afad45e9d702": [
    "16",
    "thousand"
  ],
  "d81d1ae7-363c-4b47-8eea-1906fef33856": [
    ["2018", "2019"],
    ""
  ]
  ...
}

The format of sample prediction file of TAT-QA is also suitable for TAT-HQA.

Submission

Please email the prediction file of the test set with the following information to us:

Model name with a brief description of your model
Paper title & link (if any)
Team name
Contact person
Email of the contact person

Please give us up to two weeks to evaluate your submission and we will add your model to the leaderboard.

Contact

For more information, please contact:

Moxin LIlimoxin@u.nus.edu
Fuli FENG fulifeng93@gmail.com

Please kindly cite our work if the dataset helps your research.


@inproceedings{li-etal-2022-learning,
    title = "Learning to Imagine: Integrating Counterfactual Thinking in Neural Discrete Reasoning",
    author = "Li, Moxin  and
      Feng, Fuli  and
      Zhang, Hanwang  and
      He, Xiangnan  and
      Zhu, Fengbin  and
      Chua, Tat-Seng",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.5",
    doi = "10.18653/v1/2022.acl-long.5",
    pages = "57--69"
}