Internal Structure & Guiding Principles

This vignette is meant for those who wish to contribute to {cbioportalR}, or users who wish to gain an understanding of the inner-workings of {cbioportalR}.

Guiding Principles

  • We aim to keep returned data as close to raw API results as possible and avoid additional transformations/formatting. Because of this, data is not returned in tidy format. In the future we may consider removing non informative columns from results and returning everything as tibbles with clean names, but for now keeping data in raw format helps us be flexible to future cBioPortal API changes.

  • We try to limit internal API calls within functions that the user did not explicitly ask for. In some places we deem it necessary to break this (for example in .get_data_by_study() if a user passes a study_id we look up the molecular_profile_id), but we largely default to having slightly less informative messages/checks in favor of fewer hidden calls, as these can be problematic with slow internet or large data sets.

General Structure

  • The cbp_api() internal function makes all api calls (GET and POST using {httr} package). All other user-facing or internal functions paste together necessary URLs and query parameters based on the function and its arguments and pass to this.
  • .get_data_by_study() and .get_data_by_sample() are internal utility functions that work for mutation/cna/fusion data pulls.
  • Functions ending in *_by_study() query by study_id, functions ending in *_by_sample() query by sample ID, and functions ending in *_by_patient() query by patient ID.
  • There are no specific differences between functions starting with available_*() and those starting with get_*(). These are named according to how you may use them in your workflow. For example, you would check the available_profiles() for your study_id in order to get_fusions_by_study()