PACIFIC: Towards Proactive Conversational Question Answering over Tabular and Textual Data in Finance

Introduction

PACIFIC (ProAactive ConversatIonal question answering in FInanCe) is a large-scale conversational question answering (CQA) dataset, aiming to stimulate progress of proactive CQA research over more complex and realistic tabular and textual data, especially those requiring numerical reasoning.

The PACIFIC dataset is built from the TAT-QA dataset by using its question-answer pairs as guidance for constructing conversation sessions. Compared with existing datasets, PACIFIC exhibits three key challenges:

Proactivity: The system needs to proactively assist the user to clarify their question intent by asking clarifying questions;
Numerical Reasoning: There are a large number of questions that require numerical reasoning to answer;
Hybrid Context: The context given is hybrid, comprising a semi-structured table and at least two relevant paragraphs that describe, analyze or complement the table.

In total, PACIFIC contains 2,757 dialogues associated with 19,008 QA turns.

The following is an example of PACIFIC and the corresponding example from TAT-QA. The left dashed line box shows a hybrid context.

For more information, please read our EMNLP 2022 paper [PDF].

Getting Started

Download a copy of the dataset from our Github Repo.
The data format is nearly identical to TAT-QA, with an additional key for clarification need prediction, i.e., "req_clari" and two additional keys for query rewriting, i.e., "follow_up" and "original_question".
To implement the paper method, i.e., UniPCQA, you can follow the instruction in the same Github Repo.
The testing data of PACIFIC contains no answer and is used for evaluation on the leaderboard.

{
  "table": {                                                            # The tabular data in a hybrid context
    "uid": "e78f8b29-6085-43de-b32f-be1a68641be3",                      # The unique id of a table
    "table": [                                                          # The table content which is 2d-array
      [
        "",
        "2019",
        "2018",
        "2017"
      ],
      ...
      [
        "Discount rate",
        "2.3",
        "2.5",
        "2.6"
      ]
    ]
  },
  "paragraphs": [                                                        # The textual data in a hybrid context comprising at least two associated paragraphs to the table
    {
      "uid": "62be4f5a-1693-4e6b-8bb4-0a4e1e40b409",                     # The unique id of a paragraph
      "order": 1,                                                        # The order of the paragraph in all associated paragraphs, starting from 1
      "text": "Actuarial assumptions"              # The content of the paragraph
    },
    ...
  ],
  "questions": [                                                         # The questions associated to the hybrid context
    {
      "uid": "d0d414be-e75d-43cb-b37a-c96d6367aae2",                     # The unique id of a question
      "order": 4,                                                        # The order of the question in the conversation, starting from 1
      "question": "What is the average rate of inflation?",      # The question itself
      "answer": [                                                   # The ground-truth answer
      "Which year are you asking about?"
      ],
      "derivation": "",                                       # The derivation that can be executed to arrive at the ground-truth answer
      "answer_type": "question",                                       # The answer type including `span`, `spans`, `arithmetic`, `counting`, and `question`.
      "answer_from": "table",                                       # The source of the answer including `table`, `table` and `table-text`
      "rel_paragraphs": "",                                                # The orders of the paragraphs that are relied to infer the answer if any.
      "req_comparison": false,                                           # A flag indicating if `comparison/sorting` is needed to answer the question whose answer is a single span or multiple spans
      "scale": ""                                                 # The scale of the answer including `None`, `thousand`, `million`, `billion` and `percent`
      "req_clari": true,                                       # Whether the user question requires clarifications. 
      "follow_up": false,                                       # Whether the user question is a follow-up question that requires the information from the previous conversation. 
      "original_question": "What is the average rate of inflation?",                                       # The self-contained question for query rewriting. 
    }
  ]
}

Leaderboard

Rank	Model Name	Team Name	Exact Match	F1	Created	Paper	Codes
-	Human Performance	-	83.6	92.1	-	-	-
1	HITReasoner	Hero_Dirk	65.8	74.0	21 Feb 2023	N.A.	N.A.
2	Baseline - UniPCQA	NExT/CUHK	61.9	70.2	22 Oct 2022	Paper	Code

Submission

To evaluate your models, we have also made available the evaluation script we will use for official evaluation, To run the evaluation, use

python pacific_eval.py --gold_path=:path_to_dev --pred_path=:path_to_predictions

Predictions Format

The predictions file in JSON format contains a dictionary with question ids as keys and the predictions as values (each prediction shall include both `answer` and `scale` in an array). For example,

{
 "9337c3e6-c53f-45a9-836a-02c474ceac16": [
    "4.6",
    "percent"
  ],
  "c4170232-e89c-487a-97c5-afad45e9d702": [
    "16",
    "thousand"
  ],
  "d81d1ae7-363c-4b47-8eea-1906fef33856": [
    ["2018", "2019"],
    ""
  ]
  ...
}

The evaluation and the submission follow the same form as TAT-QA.

Submission

Please email the prediction file of the test set with the following information to us:

Model name with a brief description of your model
Team name
Contact person
Email of the contact person
Link to the paper (if any)
Link to the code repository (if any)

Please give us up to two weeks to evaluate your submission and we will add your model to the leaderboard.

Contact

For more information, please contact:

Yang DENG dengyang17dydy@gmail.com

Please kindly cite our work if you use our dataset or codes, thank you.


    @inproceedings{emnlp22pacific,
        author    = {Yang Deng and
                     Wenqiang Lei and
                     Wenxuan Zhang and
                     Wai Lam and
                     Tat{-}Seng Chua},
        title     = {{PACIFIC:} Towards Proactive Conversational Question Answering over
                     Tabular and Textual Data in Finance},
        booktitle = {Proceedings of the 2022 Conference on Empirical Methods in Natural
                     Language Processing, {EMNLP} 2022},
        pages     = {6970--6984},
        year      = {2022},
      }