Architecture

This documentation was generated from a jupyter notebook, architecture.ipynb, which can be found in the notebooks directory of schedview.

Automatically format code in this notebook:

%load_ext lab_black

Introduction

The schedview module organizes code used to create scheduler visualizations into four submodules corresponding to different stages in transforming the data from a raw reference to a source into a useful visualization. These four stages are:

  • collection, which obtains the data from whatever resources are required;

  • computation, which transforms the data into values to be directly represented in the visualizatios;

  • plotting, which generates visualization objects; and

  • dashboard generation, which collects and displays visualizations in a web application

This notebook walks through the process of creating a visualization, one stage at a time, using an example chosen to demonstrate the principles underlying the chosen architecture.

In this example, we build a dashboard that shows the locations of minor planets (in equatorial coordinates) over a period of time. This application is outside the scope of the content intended to included in schedview, which only packages scheduler and progress related visualizations. schedview’s basic architecture, however, is applicable beyond its scope. This example was chosen because it is an application to real-world data that is complex enough to demonstrate all aspects of the architecture, and can be implemented in this architecture with a minimum of additional application-specific complexities that would distract from them.

Collection

Code in the schedview.collect submodule retrieves the data to be visualized from wherever they originate. Typically, functions in schedview.collect take references to resources (e.g. file names or URLs) as arguments and return python objects.

For example, consider the function below, which reads orbital elements for minor planets from a file using the skyfield module:

import skyfield.api
import skyfield.data.mpc


def read_minor_planet_orbits(file_name):
    with skyfield.api.load.open(file_name) as file_io:
        minor_planets = skyfield.data.mpc.load_mpcorb_dataframe(file_io)
    return minor_planets

Take a look at what it does:

file_name = "mpcorb_sample_big.dat"
minor_planet_orbits = read_minor_planet_orbits(file_name)
minor_planet_orbits
designation_packed magnitude_H magnitude_G epoch_packed mean_anomaly_degrees argument_of_perihelion_degrees longitude_of_ascending_node_degrees inclination_degrees eccentricity mean_daily_motion_degrees ... observations oppositions observation_period rms_residual_arcseconds coarse_perturbers precise_perturbers computer_name hex_flags designation last_observation_date
0 00001 3.33 0.15 K239D 60.07881 73.42179 80.25496 10.58688 0.078913 0.214107 ... 7283 123 1801-2023 0.65 M-v 30k MPCLINUX 0000 (1) Ceres 20230321
1 00002 4.12 0.15 K239D 40.59806 310.87290 172.91881 34.92584 0.230229 0.213774 ... 8862 121 1804-2023 0.59 M-c 28k MPCLINUX 0000 (2) Pallas 20230603
2 00003 5.16 0.15 K239D 37.02310 247.73792 169.83920 12.99055 0.256213 0.226004 ... 7450 113 1804-2023 0.63 M-v 3Ek MPCLINUX 0000 (3) Juno 20230210
3 00004 3.22 0.15 K239D 169.35183 151.66223 103.71002 7.14218 0.089449 0.271522 ... 7551 110 1821-2023 0.63 M-p 18k MPCLINUX 0000 (4) Vesta 20230814

4 rows × 23 columns

This code doesn’t actually do anything to the data: it just retrieves it. When using schedview at sites that require different access methods, different implementations of the collect stage will be needed. If different sites with different access methods need to do the same cleaning, selection, or computation on the data, the implementation of such code within the collection submodule will hinder code reuse.

Computation

In instances where the data cannot be visualized directly as returned from the data source, any processing should be done using the schedview.compute submodule.

For example, let’s say we want to plot the positions of the minor planets whose orbital elements we loaded in the collection example above. We are not interested in the orbital elements directly, but rather the positions, so we need to actually derive the one from the other. So, we create a function in the schedview.compute submodule that drives the code to do the computation, and create an object suitable for passing as input to whatever module we are using for creating plots. (In this case, that’s bokeh, but it could as easily have been matplotlib.)

import astropy.units as u
from astropy.time import Time, TimeDelta
from astropy.timeseries import TimeSeries
import skyfield.api
import skyfield.data.mpc
from skyfield.constants import GM_SUN_Pitjeva_2005_km3_s2 as GM_SUN
import bokeh.models


def compute_minor_planet_positions(
    minor_planet_orbits, start_mjd, end_mjd, time_step=7
):
    # Convert input fields into object appropriate for skyfield
    timescale = skyfield.api.load.timescale()
    start_ts = timescale.from_astropy(Time(start_mjd, format="mjd"))
    end_ts = timescale.from_astropy(Time(end_mjd, format="mjd"))
    n_samples = int((1 + end_mjd - start_mjd) / time_step)
    sample_times = timescale.linspace(start_ts, end_ts, n_samples)

    ephemeris = skyfield.api.load("de421.bsp")
    sun = ephemeris["sun"]

    position_data = {"designation": [], "mjd": [], "ra": [], "decl": [], "distance": []}
    for _, orbit in minor_planet_orbits.iterrows():
        orbit_rel_sun = skyfield.data.mpc.mpcorb_orbit(orbit, timescale, GM_SUN)
        minor_planet = sun + orbit_rel_sun
        for sample_time in sample_times:
            ra, decl, distance = (
                ephemeris["earth"].at(sample_time).observe(minor_planet).radec()
            )
            position_data["designation"].append(orbit["designation"])
            position_data["mjd"].append(sample_time.to_astropy().mjd)
            position_data["ra"].append(ra._degrees)
            position_data["decl"].append(decl._degrees)
            position_data["distance"].append(distance.au)

    position_ds = bokeh.models.ColumnDataSource(position_data)

    return position_ds

Take a look at what it does:

position_ds = compute_minor_planet_positions(minor_planet_orbits, 60200, 60366, 1)
position_ds.to_df()
designation mjd ra decl distance
0 (1) Ceres 60200.000801 208.460906 -6.151262 3.373882
1 (1) Ceres 60201.000801 208.830744 -6.329553 3.382607
2 (1) Ceres 60202.000801 209.201817 -6.507387 3.391235
3 (1) Ceres 60203.000801 209.574111 -6.684750 3.399765
4 (1) Ceres 60204.000801 209.947612 -6.861627 3.408197
... ... ... ... ... ...
663 (4) Vesta 60362.000801 81.371339 23.180175 2.035636
664 (4) Vesta 60363.000801 81.475208 23.222025 2.047772
665 (4) Vesta 60364.000801 81.586028 23.263744 2.059969
666 (4) Vesta 60365.000801 81.703711 23.305317 2.072222
667 (4) Vesta 60366.000801 81.828170 23.346729 2.084530

668 rows × 5 columns

schedview.compute is not intended to hold processing code of general interest, but rather computation specific to the creation of scheduler visualizations.

In the example above, the function itself did not implement the orbital calculations itself, but rather called the functionality in skyfield. On the other hand, it did include the data restructuring needed to apply the data in the format returned by the function in the collection step to skyfield, and transform the results into python objects well suited to being passed directly to the plotting tools being used.

Even in instances specific to Rubin Observatory, the computation may be better collected in other modules (e.g. rubin_sim) or in their own, and then called by a thin driver in schedview.compute.

When the computations are time-consuming, it may be better use separate processes to generate data products independenty of schedview, and then load these derived data products using tools in schedview.collect.

Plotting

Functions in the schedview.plot submodule create instances of visualization objects from the data, as provided either by the schedview.collect or schedview.compute (when necessary) submodules.

These “visualization objects” can be anything that can be directly rendered in a jupyter notebook or by panel in a dashboard, including matplotlib figures, bokeh plots, plain HTML, png images, and many others.

This example creates a simple plot of the minor planet data, as generated above:

import bokeh.plotting
import bokeh.palettes
import bokeh.transform
import numpy as np


def map_minor_planet_positions(position_ds):
    figure = bokeh.plotting.figure()

    minor_planet_designations = np.unique(position_ds.data["designation"])
    cmap = bokeh.transform.factor_cmap(
        "designation",
        palette=bokeh.palettes.Category20[len(minor_planet_designations)],
        factors=minor_planet_designations,
    )

    figure.scatter(
        "ra", "decl", color=cmap, legend_field="designation", source=position_ds
    )
    figure.title = "Select minor planet positions"
    figure.yaxis.axis_label = "Declination (degrees)"
    figure.xaxis.axis_label = "R.A. (degrees)"

    return figure

Once again, we can display this directly within our notebook:

import bokeh.io

# Add the jupyter extension that supports display of bokeh figures
# This only needs to be done once, typically at the top of a notebook.
bokeh.io.output_notebook()

figure = map_minor_planet_positions(position_ds)
bokeh.io.show(figure)
Loading BokehJS ...

The schedview module holds plotting tools for specific instances of plots useful for studying the scheduler or survey progress.

As was the case for functions in the schedview.compute submodule, functionality that is of interest beyond the scheduler should be extracted into a separate module. The uranography module is an example of where this has already been done.

Dashboard applications

Together, a developer can use functions supplied by the schedview.collect, schedview.compute, and schedview.plot submodules to build plots in jupyter notebooks. Using schedview in this maximizes flexibility, allowing bespoke or alternate collection and processing between or instead of functions supplied by schedview, and the plots themselves can be extended and customized beyond what schedview provides using the relevant plotting libraries (bokeh or matplotlib).

Often, though, standardized dashboards that show a set of visualizations easily is more useful, even at the expense of the full flexibility of a jupyter notebook.

For this, dashboard applications can be created the schedview.app submodule.

The suggested tool for building such applications is the creation of a param.Parameterized class displayed through a panel application.

The class definition of a param.Parameterized subclass encodes dependencies between user supplied parameters, stages of processing, and the visualization ultimately produced.

The panel and param documentation provides more complete explanation and tutorials. Note that there are alternate approaches to using panel to generate dashboards; this approach is covered by the “Declare UIs with Declarative API” section of the panel documentation.

A full explanation of the panel’s declarative API is beyond the scope of this notebook, but SimpleSampleDashboard class below gives a simple example of how it works.

import param
import panel as pn


class SimpleSampleDashboard(param.Parameterized):
    orbit_filename = param.FileSelector(
        default="./mpcorb_sample_big.dat",
        path="./mpcorb_*.dat",
        doc="Data file with orbit parameters",
        label="Orbit data file",
    )

    start_mjd = param.Number(
        default=60200,
        doc="Modified Julian Date of start of date window",
        label="Start MJD",
    )

    end_mjd = param.Number(
        default=60565, doc="Modified Julian Date of end of date window", label="End MJD"
    )

    orbits = param.Parameter()

    positions = param.Parameter()

    @param.depends("orbit_filename", watch=True)
    def update_orbits(self):
        if self.orbit_filename is None:
            print("No file supplied, not loading orbits")
            return

        print("Updating orbits")
        self.orbits = read_minor_planet_orbits(self.orbit_filename)

    @param.depends("orbits", "start_mjd", "end_mjd", watch=True)
    def update_positions(self):
        if self.orbits is None:
            print("No orbits, not updating positions")
            return

        print("Updating positions")
        self.positions = compute_minor_planet_positions(
            self.orbits, self.start_mjd, self.end_mjd, time_step=28
        )

    @param.depends("positions")
    def make_position_figure(self):
        if self.positions is None:
            return None

        figure = map_minor_planet_positions(self.positions)
        return figure

    def make_app(self):
        self.update_orbits()

        app = pn.Row(
            pn.Param(self, parameters=["orbit_filename", "start_mjd", "end_mjd"]),
            pn.param.ParamMethod(self.make_position_figure, loading_indicator=True),
        )
        return app

Now we can use the app within our notebook:

# Load the jupyter extension that allows the display of
# panel dashboards within jupyter
pn.extension()

# Instantite the app
dashboard = SimpleSampleDashboard()
app = dashboard.make_app()

# Actually display the app
app
Updating orbits
Updating positions

Making a stand-alone app

To create a stand-alone app that can be run as its own web service, outside jupyter, a driver function needs to be added.

For the above example, it would look something like this:

def main():
    # In this trivial example, this extra declaration
    # is pointless functionally. But, in a real app,
    # you probably want to use something like this
    # to make sure relevant configuration arguments
    # get passed.
    def make_app():
        dashboard = SimpleSampleDashboard()
        return dashboard.make_app()

    pn.serve(make_app, port=8080, title="Simple Sample Dashboard")

Then, an entry point for the new dashboard can be added to pyproject.toml so that an executable to start the server is added to the path when the python module is installed.