Architecture¶
This documentation was generated from a jupyter notebook,
architecture.ipynb
, which can be found in the notebooks
directory of schedview
.
Automatically format code in this notebook:
%load_ext lab_black
Introduction¶
The schedview
module organizes code used to create scheduler
visualizations into four submodules corresponding to different stages in
transforming the data from a raw reference to a source into a useful
visualization. These four stages are:
collection, which obtains the data from whatever resources are required;
computation, which transforms the data into values to be directly represented in the visualizatios;
plotting, which generates visualization objects; and
dashboard generation, which collects and displays visualizations in a web application
This notebook walks through the process of creating a visualization, one stage at a time, using an example chosen to demonstrate the principles underlying the chosen architecture.
In this example, we build a dashboard that shows the locations of minor
planets (in equatorial coordinates) over a period of time. This
application is outside the scope of the content intended to included in
schedview
, which only packages scheduler and progress related
visualizations. schedview
’s basic architecture, however, is
applicable beyond its scope. This example was chosen because it is an
application to real-world data that is complex enough to demonstrate all
aspects of the architecture, and can be implemented in this architecture
with a minimum of additional application-specific complexities that
would distract from them.
Collection¶
Code in the schedview.collect
submodule retrieves the data to be
visualized from wherever they originate. Typically, functions in
schedview.collect
take references to resources (e.g. file names or
URLs) as arguments and return python objects.
For example, consider the function below, which reads orbital elements
for minor planets from a file using the skyfield
module:
import skyfield.api
import skyfield.data.mpc
def read_minor_planet_orbits(file_name):
with skyfield.api.load.open(file_name) as file_io:
minor_planets = skyfield.data.mpc.load_mpcorb_dataframe(file_io)
return minor_planets
Take a look at what it does:
file_name = "mpcorb_sample_big.dat"
minor_planet_orbits = read_minor_planet_orbits(file_name)
minor_planet_orbits
designation_packed | magnitude_H | magnitude_G | epoch_packed | mean_anomaly_degrees | argument_of_perihelion_degrees | longitude_of_ascending_node_degrees | inclination_degrees | eccentricity | mean_daily_motion_degrees | ... | observations | oppositions | observation_period | rms_residual_arcseconds | coarse_perturbers | precise_perturbers | computer_name | hex_flags | designation | last_observation_date | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 00001 | 3.33 | 0.15 | K239D | 60.07881 | 73.42179 | 80.25496 | 10.58688 | 0.078913 | 0.214107 | ... | 7283 | 123 | 1801-2023 | 0.65 | M-v | 30k | MPCLINUX | 0000 | (1) Ceres | 20230321 |
1 | 00002 | 4.12 | 0.15 | K239D | 40.59806 | 310.87290 | 172.91881 | 34.92584 | 0.230229 | 0.213774 | ... | 8862 | 121 | 1804-2023 | 0.59 | M-c | 28k | MPCLINUX | 0000 | (2) Pallas | 20230603 |
2 | 00003 | 5.16 | 0.15 | K239D | 37.02310 | 247.73792 | 169.83920 | 12.99055 | 0.256213 | 0.226004 | ... | 7450 | 113 | 1804-2023 | 0.63 | M-v | 3Ek | MPCLINUX | 0000 | (3) Juno | 20230210 |
3 | 00004 | 3.22 | 0.15 | K239D | 169.35183 | 151.66223 | 103.71002 | 7.14218 | 0.089449 | 0.271522 | ... | 7551 | 110 | 1821-2023 | 0.63 | M-p | 18k | MPCLINUX | 0000 | (4) Vesta | 20230814 |
4 rows × 23 columns
This code doesn’t actually do anything to the data: it just retrieves
it. When using schedview
at sites that require different access
methods, different implementations of the collect
stage will be
needed. If different sites with different access methods need to do the
same cleaning, selection, or computation on the data, the implementation
of such code within the collection
submodule will hinder code reuse.
Computation¶
In instances where the data cannot be visualized directly as returned
from the data source, any processing should be done using the
schedview.compute
submodule.
For example, let’s say we want to plot the positions of the minor
planets whose orbital elements we loaded in the collection example
above. We are not interested in the orbital elements directly, but
rather the positions, so we need to actually derive the one from the
other. So, we create a function in the schedview.compute
submodule
that drives the code to do the computation, and create an object
suitable for passing as input to whatever module we are using for
creating plots. (In this case, that’s bokeh
, but it could as easily
have been matplotlib
.)
import astropy.units as u
from astropy.time import Time, TimeDelta
from astropy.timeseries import TimeSeries
import skyfield.api
import skyfield.data.mpc
from skyfield.constants import GM_SUN_Pitjeva_2005_km3_s2 as GM_SUN
import bokeh.models
def compute_minor_planet_positions(
minor_planet_orbits, start_mjd, end_mjd, time_step=7
):
# Convert input fields into object appropriate for skyfield
timescale = skyfield.api.load.timescale()
start_ts = timescale.from_astropy(Time(start_mjd, format="mjd"))
end_ts = timescale.from_astropy(Time(end_mjd, format="mjd"))
n_samples = int((1 + end_mjd - start_mjd) / time_step)
sample_times = timescale.linspace(start_ts, end_ts, n_samples)
ephemeris = skyfield.api.load("de421.bsp")
sun = ephemeris["sun"]
position_data = {"designation": [], "mjd": [], "ra": [], "decl": [], "distance": []}
for _, orbit in minor_planet_orbits.iterrows():
orbit_rel_sun = skyfield.data.mpc.mpcorb_orbit(orbit, timescale, GM_SUN)
minor_planet = sun + orbit_rel_sun
for sample_time in sample_times:
ra, decl, distance = (
ephemeris["earth"].at(sample_time).observe(minor_planet).radec()
)
position_data["designation"].append(orbit["designation"])
position_data["mjd"].append(sample_time.to_astropy().mjd)
position_data["ra"].append(ra._degrees)
position_data["decl"].append(decl._degrees)
position_data["distance"].append(distance.au)
position_ds = bokeh.models.ColumnDataSource(position_data)
return position_ds
Take a look at what it does:
position_ds = compute_minor_planet_positions(minor_planet_orbits, 60200, 60366, 1)
position_ds.to_df()
designation | mjd | ra | decl | distance | |
---|---|---|---|---|---|
0 | (1) Ceres | 60200.000801 | 208.460906 | -6.151262 | 3.373882 |
1 | (1) Ceres | 60201.000801 | 208.830744 | -6.329553 | 3.382607 |
2 | (1) Ceres | 60202.000801 | 209.201817 | -6.507387 | 3.391235 |
3 | (1) Ceres | 60203.000801 | 209.574111 | -6.684750 | 3.399765 |
4 | (1) Ceres | 60204.000801 | 209.947612 | -6.861627 | 3.408197 |
... | ... | ... | ... | ... | ... |
663 | (4) Vesta | 60362.000801 | 81.371339 | 23.180175 | 2.035636 |
664 | (4) Vesta | 60363.000801 | 81.475208 | 23.222025 | 2.047772 |
665 | (4) Vesta | 60364.000801 | 81.586028 | 23.263744 | 2.059969 |
666 | (4) Vesta | 60365.000801 | 81.703711 | 23.305317 | 2.072222 |
667 | (4) Vesta | 60366.000801 | 81.828170 | 23.346729 | 2.084530 |
668 rows × 5 columns
schedview.compute
is not intended to hold processing code of general
interest, but rather computation specific to the creation of scheduler
visualizations.
In the example above, the function itself did not implement the orbital
calculations itself, but rather called the functionality in
skyfield
. On the other hand, it did include the data restructuring
needed to apply the data in the format returned by the function in the
collection step to skyfield
, and transform the results into python
objects well suited to being passed directly to the plotting tools being
used.
Even in instances specific to Rubin Observatory, the computation may be
better collected in other modules (e.g. rubin_sim
) or in their own,
and then called by a thin driver in schedview.compute
.
When the computations are time-consuming, it may be better use separate
processes to generate data products independenty of schedview
, and
then load these derived data products using tools in
schedview.collect
.
Plotting¶
Functions in the schedview.plot
submodule create instances of
visualization objects from the data, as provided either by the
schedview.collect
or schedview.compute
(when necessary)
submodules.
These “visualization objects” can be anything that can be directly
rendered in a jupyter notebook or by panel in a dashboard, including
matplotlib
figures, bokeh
plots, plain HTML, png
images, and
many others.
This example creates a simple plot of the minor planet data, as generated above:
import bokeh.plotting
import bokeh.palettes
import bokeh.transform
import numpy as np
def map_minor_planet_positions(position_ds):
figure = bokeh.plotting.figure()
minor_planet_designations = np.unique(position_ds.data["designation"])
cmap = bokeh.transform.factor_cmap(
"designation",
palette=bokeh.palettes.Category20[len(minor_planet_designations)],
factors=minor_planet_designations,
)
figure.scatter(
"ra", "decl", color=cmap, legend_field="designation", source=position_ds
)
figure.title = "Select minor planet positions"
figure.yaxis.axis_label = "Declination (degrees)"
figure.xaxis.axis_label = "R.A. (degrees)"
return figure
Once again, we can display this directly within our notebook:
import bokeh.io
# Add the jupyter extension that supports display of bokeh figures
# This only needs to be done once, typically at the top of a notebook.
bokeh.io.output_notebook()
figure = map_minor_planet_positions(position_ds)
bokeh.io.show(figure)
The schedview
module holds plotting tools for specific instances of
plots useful for studying the scheduler or survey progress.
As was the case for functions in the schedview.compute
submodule,
functionality that is of interest beyond the scheduler should be
extracted into a separate module. The uranography
module is an
example of where this has already been done.
Dashboard applications¶
Together, a developer can use functions supplied by the
schedview.collect
, schedview.compute
, and schedview.plot
submodules to build plots in jupyter notebooks. Using schedview
in
this maximizes flexibility, allowing bespoke or alternate collection and
processing between or instead of functions supplied by schedview
,
and the plots themselves can be extended and customized beyond what
schedview provides using the relevant plotting libraries (bokeh
or
matplotlib
).
Often, though, standardized dashboards that show a set of visualizations easily is more useful, even at the expense of the full flexibility of a jupyter notebook.
For this, dashboard applications can be created the schedview.app
submodule.
The suggested tool for building such applications is the creation of a
param.Parameterized
class displayed through a panel
application.
The class definition of a param.Parameterized
subclass encodes
dependencies between user supplied parameters, stages of processing, and
the visualization ultimately produced.
The panel
and param
documentation provides more complete
explanation and tutorials. Note that there are alternate approaches to
using panel
to generate dashboards; this approach is covered by the
“Declare UIs with Declarative
API” section of
the panel
documentation.
A full explanation of the panel
’s declarative API is beyond the
scope of this notebook, but SimpleSampleDashboard
class below gives
a simple example of how it works.
import param
import panel as pn
class SimpleSampleDashboard(param.Parameterized):
orbit_filename = param.FileSelector(
default="./mpcorb_sample_big.dat",
path="./mpcorb_*.dat",
doc="Data file with orbit parameters",
label="Orbit data file",
)
start_mjd = param.Number(
default=60200,
doc="Modified Julian Date of start of date window",
label="Start MJD",
)
end_mjd = param.Number(
default=60565, doc="Modified Julian Date of end of date window", label="End MJD"
)
orbits = param.Parameter()
positions = param.Parameter()
@param.depends("orbit_filename", watch=True)
def update_orbits(self):
if self.orbit_filename is None:
print("No file supplied, not loading orbits")
return
print("Updating orbits")
self.orbits = read_minor_planet_orbits(self.orbit_filename)
@param.depends("orbits", "start_mjd", "end_mjd", watch=True)
def update_positions(self):
if self.orbits is None:
print("No orbits, not updating positions")
return
print("Updating positions")
self.positions = compute_minor_planet_positions(
self.orbits, self.start_mjd, self.end_mjd, time_step=28
)
@param.depends("positions")
def make_position_figure(self):
if self.positions is None:
return None
figure = map_minor_planet_positions(self.positions)
return figure
def make_app(self):
self.update_orbits()
app = pn.Row(
pn.Param(self, parameters=["orbit_filename", "start_mjd", "end_mjd"]),
pn.param.ParamMethod(self.make_position_figure, loading_indicator=True),
)
return app
Now we can use the app within our notebook:
# Load the jupyter extension that allows the display of
# panel dashboards within jupyter
pn.extension()
# Instantite the app
dashboard = SimpleSampleDashboard()
app = dashboard.make_app()
# Actually display the app
app
Updating orbits
Updating positions
Making a stand-alone app¶
To create a stand-alone app that can be run as its own web service,
outside jupyter
, a driver function needs to be added.
For the above example, it would look something like this:
def main():
# In this trivial example, this extra declaration
# is pointless functionally. But, in a real app,
# you probably want to use something like this
# to make sure relevant configuration arguments
# get passed.
def make_app():
dashboard = SimpleSampleDashboard()
return dashboard.make_app()
pn.serve(make_app, port=8080, title="Simple Sample Dashboard")
Then, an entry point for the new dashboard can be added to
pyproject.toml
so that an executable to start the server is added to
the path when the python module is installed.