Skip to content

Logo

Pediatric Cancer Data Commons
Data Portal User Guide

v1.25.0 User Guide

Introduction

The Pediatric Cancer Data Commons (PCDC) brings together clinical, genomic, and imaging data from institutions around the world that are working to transform pediatric cancer research and outcomes. Headquartered at the University of Chicago, the PCDC works with international leaders in pediatric cancers and the US National Cancer Institute to develop and apply uniform data standards, facilitating the collection, combination, and analysis of data from many different sources.

By harmonizing existing clinical research data and leading international efforts to standardize data collection, we are breaking down long-standing barriers that have held back advancements in research on rare diseases. Our aim is to leverage this unique collaborative consortium-based approach to enable new and meaningful discoveries about pediatric cancers.

The PCDC Data Portal supports the management, analysis and sharing of data for the research community. The portal includes a data dictionary and data search functionality.

For additional information, users can watch this introductory video.

Access

The PCDC Data Portal can be accessed using a web browser by visiting https://portal.pedscommons.org/. New users to the PCDC Data Portal are prompted to login using an approved authentication provider (i.e., Google).

Landing

After successfully logging in with Google for the first time, users are prompted to associate their account with the PCDC Data Portal.

Registration of an account with the PCDC Data Portal requires users to provide their email (supplied by the authentication provider), first name, last name, and institutional affiliation (e.g. University of California Berkeley). Registration also requires that users read and acknowledge agreement with the PCDC Privacy Notice, the PCDC Terms and Conditions, and the Acceptable Use Policy.

Registration

Users who have previously registered an account will be taken directly to the main page of the PCDC Data Portal after authentication.

Users will initially have access to summary data only. Researchers can request access to line-level data by clicking the Request Access button on the Exploration page. See the Exploration section below.

After successful login, users will see the main page.

Main

The main navigation buttons to PCDC Data Portal pages are in the upper right-hand corner of the page:

  • Dictionary
  • Exploration
  • Query

Profile settings can be accessed by clicking the username at the top of the page. See the Profile section below.

Note: to change the magnification of text, access the browser controls and use the zoom function to increase or decrease the font size. The screenshot below shows the zoom control in the Chrome browser.

Zoom

Dictionary

Dictionary is the data dictionary used to define the data in the portal, including what each entity (Node) represents, properties associated with the entities, and possible values.

The Dictionary is accessed by clicking the Dictionary button in the top-right hand corner of the PDCD Data Portal main page.

Dictionary Button

Example: for the data entity named “Cytology”, the dictionary includes seven properties as shown below.

Dictionary Cytology

Terms used in the dictionary that are defined in the NCIt (National Cancer Institute thesaurus) include a link next to the term showing the NCIt code. Clicking the link will open a browser tab to the NCIt site.

On the right side of the Table View are Download Templates for each data element.

Template Download

Templates are available in JSON (JavaScript Object Notation) or TSV (tab separated values) format. Templates are used when submitting clinical trial data for use in the portal.

The data dictionary can be viewed as a table, (shown above), or in graphical format by clicking on the Graph View button in the upper left. The Graph View shows an overview of the data model, which can be zoomed in to see individual components of the model and how they relate to one another.

Dictionary Graph

Clicking the Open Properties button will switch to the table view as shown above.

Dictionary Graph Detail

How to find an item

  1. In the main page click the Dictionary button. The Dictionary page is displayed.
  2. Type the item you want to find in the Search in Dictionary field on the left of the page.

    Search in Dictionary

    Items that match your search are displayed as you type:

    Search in Dictionary

  3. Click the item in the list you want to display.

    Search in Dictionary Results

The Graph View area of the screen will highlight Nodes that include the search term(s).

The Search Results on the left side shows the number of matches in the Node title and description, as well as the number of matches in the node properties.

Search Results Counts

The Last Search field shows the number of Nodes where the term appears – in this example: 3.

Search Results Node Count

Click on the Node to see the properties.

Search Results Node

Matching search results are highlighted in amber text. Click on the See All button to toggle between matched results and all properties.

Search Results Highlight

Exploration

The Exploration button at the top of the home page takes the user to the main data page allowing users to see summary data.

Users can refer to this video for an overview of the Exploration functionality.

For users who do not have access to line-level data, the top of the Exploration page will display a Request Access button.

Clicking Request Access will redirect users to the PCDC Project Request Forms page. The page contains forms and instructions for requesting access to line-level data.

The Table View tab on the main page is restricted to approved researchers who have access to line-level data.

The current Release Version of the data is included at the bottom of the left panel.

The Filters panel on the left side of the Exploration page can be used to narrow the Summary View results by selected variables. Filters act upon variables in the underlying data and are available for variables in the Subject, Disease, Molecular, etc. data domains. Use the expand icon (>) to see the controls available for a given filter.

Explore Expand

Note: some filters, such as Age at Censor Status have a slider to select an age range. Age is shown in days.
Age at Censor Status

Bin Size

A bin size limit of five has been implemented for all data filters as an additional measure of participant privacy. Bin size refers to the minimum sample size reporting threshold. A lock icon will appear next to the filter that includes less than five subjects.

Summary View

Lock

The main Exploration page shows a summary view of distributions across some key demographic variables for all cases in the PCDC that match the currently-selected filters.

Exploration

Different variable types (e.g. continuous variables, discrete variables, categorical variables) will have different controls available. For example, Race is a categorical (enumerative) variable, and categorical variable filters can be selected using checkboxes with one checkbox per value within that variable. The user can select the values of a variable they are interested in as part of the population. Multiple selections within a single filter are treated as logical ‘ORs’. Therefore, a user who selects ‘Asian’ and ‘Black or African American’ would see a result set where the subjects have a Race value of either ‘Asian’ OR ‘Black OR African American’

Multiple selections made across multiple filters are treated as logical ‘ANDs’. Therefore, a user who selects ‘Asian’ and ‘Black or African American’ from the Race filter and who select ‘Male’ from the Sex filter would see a result set where the subjects have a Race value of either ‘Asian’ OR ‘Black or African American’ AND who have a Sex value of ‘Male’.

Using the above example, the Active filter at the top of the graph view shows “Sex is “Male” AND “Race is any of (logial OR) Black or African American, Asian”. The AND can be switched to OR by clicking on the AND button between the two parts of the expression. Each filter can be removed by clicking on the “x” at the end of the string.

Conjuctive Search

For patient observations that may be captured longitudinally, for example Tumor Site within the Disease tab, users can restrict filtering activity to a specific disease phase (e.g. Initial Diagnosis, Relapse) with an Anchor filter by clicking one of the radio buttons under Disease Phase.

Disease Phase

For example, a user who was interested in patients with relapse disease of the bladder would select Relapse under Disease Phase and Bladder under Tumor Site to see the count of subjects that have an observation of relapse at the bladder.

Tumor Site

Note: when changing the Anchor filter (e.g., switching from Relapse to Initial Diagnosis), users must remove the associated filters, in this case: “Bladder”, in order to see the filter options available under the new Anchor.

Clicking on the search icon to the right of a filter name (e.g. Tumor Site) allows users to search for a specific value in a list of categorical levels by keyword, Matching categorical level values will appear if there is a full or partial match, allowing the user to select it. The filter value can be selected by checking the box next to the name.

Site Search

Once selected, the main page will show the filtered data set. To clear the search, the user can hit the "x" on the right side of the search box.

<! Filtered Data (commented out this image) ->

Selected Filters

An underline in the filter name indicates that filters are active within that tab. In the below example, the Subject and Disease tabs have active filters. Clicking into the tabs will show which filters are selected (also underlined). In the example below, Initial Diagnosis is selected, along with Tumor Site from within the Disease tab.

Summary View

Interdependent Filters

Some filters have an interdependency. Example: selecting the 11q deletion molecular abnormality should be followed by a selection of Absent, Present, or Unknown from the Molecular Abnormality Result filters.

Molecular Abnormality

Filter by Consortium

The left panel includes a Consortium filter in the Subject tab which allows users to select the disease consortium data from which to create a cohort. Data is available from INRG (International Neuroblastoma Risk Group), INSTRuCT (INternational Soft Tissue saRcoma ConsorTium), MaGIC (Malignant Germ Cell International Consortium), and NODAL (hodgkiN lymphOma DatA coLlaboration) subjects.

Consortium Selection

Include or Exclude Filters

Most filters have an Include or Exclude mode below the filter name. By default, Include is selcted. Clicking on Exclude will show results for all but the selected filters.

Filters Set Workspace

Users can refer to this video for an overview of the Filter Sets and Survival Curve functionality.

The Filter Set Workspace allows the user to manage multiple filter sets and switch among available filter sets to see summary statistics based on the filters selected in each set.

Multiple Filter Sets

Use the resizing handle in the lower right corner of the Workspace area to resize the window.

Workspace resizing handle

The active filter is labeled in green on the far left. Each filter shows the expression statement that describes the filter set. Example:

Active Search

Filter Set Workspace commands allow the user to load, save and make changes to filter sets. The following commands are included:

  • New: When at least one filter set is active, pressing New will add another, empty set.
  • Compose: Allows the user to combine two or more saved filter sets using AND/OR logic.
  • Duplicate: Copies the active filter set. Note: an empty set cannot be duplicated.
  • Remove: Removes the active filter set from the workspace and makes the filter set at the top of the list active.
  • Clear: Clears the filter selections from the active filter set.
  • Remove all: Removes all filter sets from the Workspace.
  • Load: Loads a saved filter set to the Workspace. The filter set can be the user's own saved filter set or a filter set defined by a shared token from another user (see "Share" below).
  • Save: Saves the active filter set that has been newly created (i.e., not yet named). When a saved filter set is active, the Save button changes to Update.
  • Share: Generates a token that can be emailed to other users to re-create the selected filter set. Only saved filter sets can be shared.
  • Reset: When a change is made to the active filter, Reset will revert any changes to the saved version of the filter set. If the filter set was not previously saved, Reset will not be active.
  • Delete: Deletes the selected filter set, removing it from the screen and from the user's saved filters.

Saving a Filter Set

Once a set of filters has been selected, a user can save the set by clicking Save. The user will be prompted to enter a name for the saved filter set. A description can also be added.

Save Filter Set

When changes are made to a saved filter set, the Workspace will indicate the change by making the Reset button active. Clicking on Reset will revert any changes made to the original, saved filter set.

To save changes to a filter set and keep the original, users can 1) use the Duplicate function, make changes to the new filter set, and 2) Save the new filter set with a new name.

1) Use Duplicate to Create a copy of the active filter set

Duplicate Filter Set

2) Make changes to the new filter set and hit Save. Note, the new filter set has no name until it is saved.

Save Filter Set

Using Compose to Combine Filter Sets

The Compose function allows users to combine filter sets using AND/OR logic. Example: Selecting Compose in the Filter Workspace, the two separate filters shown below can be combined into one using a logical AND. Click Done when the desired filters are selected.

Compsing a filter

The resulting filter is added to the Workspace and made Active. It can now be saved to the user's list of filters.

Save Composed Filters

Note: when using a Composed filter, the left filter panel becomes inactivated.

Composed filter warning

Explore in An External Data Commons

Some subjects within the PCDC can be further explored in external Data Commons'. Currently, Gabriela Miller Kids First (GMKF) and the Genomic Data Commons (GDC) are available for exploration.

To explore subjects within an external commons:

1) Create a data set using the Filter Set Workspace. It is recommended users save their data set before using the "Explore in" function. Set the desired data set as "Active" by clicking on it within the Workspace.

2) Click on the Explore in button in the upper right part of the Exploration page. A pop-up window will open. The drop-down menu is used to select the external data commons.

Explore in

3) Download the manifest. The "Open in new tab" button will activate after downloading the file. Follow the instructions on the target website to upload the manifest.

Survival Analysis

The PCDC platform includes a Survival Analysis tool that can display survival curves and Number at Risk tables. This tool will be available temporarily during a pilot phase to test the functionality and usability of the tool.

When first accessing the survival curve generator, users must agree to follow the Acceptable Use Policy. Subsequent sessions will include a reminder of the terms whenever the Survival tab is first used.

Survival Analysis

To create a survival curve, users must first select at least one filter set to analyze. Filter Sets can be created and saved using the procedure described above. Note: Only saved filter sets can be used to create a Survival Curve.

Use the drop down labeled “Select Filter Set to Analyze” to select a saved filter set, then click the Add button to add the filter set. Optionally, a user may select “All Subjects” to see the entire data set. Additional filter sets can be added using the steps above.

The "Consortium" information label shows which data are available to create a survival curve. Filter sets that include values from other consortia are disabled.

Survival Analysis

Once selected, users can see the survival curve by hitting the Apply button at the bottom of the screen. Note: When changes are made to filter sets that are currently selected for the Survival curve, users must hit the Apply button after editing to see the changes reflected in the curve.

Multiple data sets can be selected and shown simultaneously. In the example below, two data sets are used: one for male and one for female participants.

Survival Analysis

Options

The survival curve can be adjusted using the options available.

Survival Analysis

The Consortium field indicates which disease groups have data available to create a survival curve.

The Disallowed Variables field shows which variables cannot be used to create a survival curve.

The Survival type drop down can be used to select Overall or Event-Free survival.

Start time is used to select the start time of the x-axis. Default is zero (0).

End time is used to select the maximum time (in years) that will appear on the x-axis. The default is five years. If left blank, the maximum number of years will be shown for which data is available.

Time interval is used to determine the frequency of hash marks appearing on the x-axis, both on the survival curve and in the Number at risk table shown below the curve. Default is one year.

Query

The Query page is used to create queries using the GraphiQL tool. (additional information to be provided in a future release of the User Guide).

User Menu

The User Menu can be accessed by clicking on the user icon in the top right corner of the screen. The View Profile option allows users to make changes to their name and institutional affiliation. The View Profile page can also be used to create API Keys (additional information to be provided in a future release of the User Guide).

User Menu

Data Requests

The Data Requests screen allows users to see the status of their data requests and, when approved, download data. At the top of the screen is a toggle button to see All Requests or only Approved Requests.

Data Requests

Only users who are authorized to receive requested data will see the Download Data button activated when the data is ready.

Data Requests

Documentation

Documentation Menu

The Documentation menu allows users to access reference documents related to the Data Portal such as the Privacy Notice and Terms and Conditions. Clicking on any of the documents in the list will redirect users to the D4CG Documentation site.