Introduction

PACIFIC (ProAactive ConversatIonal question answering in FInanCe) is a large-scale conversational question answering (CQA) dataset, aiming to stimulate progress of proactive CQA research over more complex and realistic tabular and textual data, especially those requiring numerical reasoning.

The PACIFIC dataset is built from the TAT-QA dataset by using its question-answer pairs as guidance for constructing conversation sessions. Compared with existing datasets, PACIFIC exhibits three key challenges:

In total, PACIFIC contains 2,757 dialogues associated with 19,008 QA turns.

The following is an example of PACIFIC and the corresponding example from TAT-QA. The left dashed line box shows a hybrid context.

PACIFIC Sample
For more information, please read our EMNLP 2022 paper [PDF].

Getting Started

{
  "table": {                                                            # The tabular data in a hybrid context
    "uid": "e78f8b29-6085-43de-b32f-be1a68641be3",                      # The unique id of a table
    "table": [                                                          # The table content which is 2d-array
      [
        "",
        "2019",
        "2018",
        "2017"
      ],
      ...
      [
        "Discount rate",
        "2.3",
        "2.5",
        "2.6"
      ]
    ]
  },
  "paragraphs": [                                                        # The textual data in a hybrid context comprising at least two associated paragraphs to the table
    {
      "uid": "62be4f5a-1693-4e6b-8bb4-0a4e1e40b409",                     # The unique id of a paragraph
      "order": 1,                                                        # The order of the paragraph in all associated paragraphs, starting from 1
      "text": "Actuarial assumptions"              # The content of the paragraph
    },
    ...
  ],
  "questions": [                                                         # The questions associated to the hybrid context
    {
      "uid": "d0d414be-e75d-43cb-b37a-c96d6367aae2",                     # The unique id of a question
      "order": 4,                                                        # The order of the question in the conversation, starting from 1
      "question": "What is the average rate of inflation?",      # The question itself
      "answer": [                                                   # The ground-truth answer
      "Which year are you asking about?"
      ],
      "derivation": "",                                       # The derivation that can be executed to arrive at the ground-truth answer
      "answer_type": "question",                                       # The answer type including `span`, `spans`, `arithmetic`, `counting`, and `question`.
      "answer_from": "table",                                       # The source of the answer including `table`, `table` and `table-text`
      "rel_paragraphs": "",                                                # The orders of the paragraphs that are relied to infer the answer if any.
      "req_comparison": false,                                           # A flag indicating if `comparison/sorting` is needed to answer the question whose answer is a single span or multiple spans
      "scale": ""                                                 # The scale of the answer including `None`, `thousand`, `million`, `billion` and `percent`
      "req_clari": true,                                       # Whether the user question requires clarifications. 
      "follow_up": false,                                       # Whether the user question is a follow-up question that requires the information from the previous conversation. 
      "original_question": "What is the average rate of inflation?",                                       # The self-contained question for query rewriting. 
    }
  ]
}

Leaderboard

Rank Model Name Team Name Exact Match F1 Created Paper Codes
- Human Performance - 83.6 92.1 - - -
1 HITReasoner Hero_Dirk 65.8 74.0 21 Feb 2023 N.A. N.A.
2 Baseline - UniPCQA NExT/CUHK 61.9 70.2 22 Oct 2022 Paper Code

Submission

To evaluate your models, we have also made available the evaluation script we will use for official evaluation, To run the evaluation, use

python pacific_eval.py --gold_path=:path_to_dev --pred_path=:path_to_predictions

Predictions Format

The predictions file in JSON format contains a dictionary with question ids as keys and the predictions as values (each prediction shall include both `answer` and `scale` in an array). For example,

{
 "9337c3e6-c53f-45a9-836a-02c474ceac16": [
    "4.6",
    "percent"
  ],
  "c4170232-e89c-487a-97c5-afad45e9d702": [
    "16",
    "thousand"
  ],
  "d81d1ae7-363c-4b47-8eea-1906fef33856": [
    ["2018", "2019"],
    ""
  ]
  ...
}

The evaluation and the submission follow the same form as TAT-QA.

Submission

Please email the prediction file of the test set with the following information to us:

Please give us up to two weeks to evaluate your submission and we will add your model to the leaderboard.

Contact

For more information, please contact:

Please kindly cite our work if you use our dataset or codes, thank you.


    @inproceedings{emnlp22pacific,
        author    = {Yang Deng and
                     Wenqiang Lei and
                     Wenxuan Zhang and
                     Wai Lam and
                     Tat{-}Seng Chua},
        title     = {{PACIFIC:} Towards Proactive Conversational Question Answering over
                     Tabular and Textual Data in Finance},
        booktitle = {Proceedings of the 2022 Conference on Empirical Methods in Natural
                     Language Processing, {EMNLP} 2022},
        pages     = {6970--6984},
        year      = {2022},
      }