|
| 1 | +--- |
| 2 | +title: "Interactive C++ for Data Science" |
| 3 | +layout: post |
| 4 | +excerpt: "This post will discuss some applications of Cling |
| 5 | +developed to support data science researchers. In particular, interactively |
| 6 | +probing data and interfaces makes complex libraries and complex data more |
| 7 | +accessible to users. We aim to demonstrate some of Cling’s features at scale; |
| 8 | +Cling’s eval-style programming support; projects related to Cling; and show |
| 9 | +interactive C++/CUDA." |
| 10 | +sitemap: false |
| 11 | +permalink: blogs/interactive-cpp-for-data-science/ |
| 12 | +author: Vassil Vassilev, David Lange, Simeon Ehrig, Sylvain Corlay |
| 13 | +--- |
| 14 | + |
| 15 | +{% capture image_style %} |
| 16 | + max-width: 70%; |
| 17 | + display: block; |
| 18 | + margin: 0 auto; |
| 19 | +{% endcapture %} |
| 20 | + |
| 21 | +> Note: This article was first published on the [LLVM Blog]. |
| 22 | +
|
| 23 | +# Interactive C++ for Data Science |
| 24 | + |
| 25 | +In our previous blog post ["Interactive C++ with Cling"](/_posts/2020-11-30-interactive-cpp-with-cling.md) |
| 26 | +we mentioned that exploratory programming is an effective way to reduce the |
| 27 | +complexity of the problem. This post will discuss some applications of Cling |
| 28 | +developed to support data science researchers. In particular, interactively |
| 29 | +probing data and interfaces makes complex libraries and complex data more |
| 30 | +accessible to users. We aim to demonstrate some of Cling’s features at scale; |
| 31 | +Cling’s eval-style programming support; projects related to Cling; and show |
| 32 | +interactive C++/CUDA. |
| 33 | + |
| 34 | +## Eval-style programming |
| 35 | + |
| 36 | +A Cling instance can access itself through its runtime. The example creates a |
| 37 | +`cling::Value` to store the execution result of the incremented variable `i`. |
| 38 | +That mechanism can be used further to support dynamic scopes extending the name |
| 39 | +lookup at runtime. |
| 40 | + |
| 41 | +```cpp |
| 42 | +[cling]$ #include <cling/Interpreter/Value.h> |
| 43 | +[cling]$ #include <cling/Interpreter/Interpreter.h> |
| 44 | +[cling]$ int i = 1; |
| 45 | +[cling]$ cling::Value V; |
| 46 | +[cling]$ gCling->evaluate("++i", V); |
| 47 | +[cling]$ i |
| 48 | +(int) 2 |
| 49 | +[cling]$ V |
| 50 | +(cling::Value &) boxes [(int) 2] |
| 51 | +``` |
| 52 | +
|
| 53 | +`V` "boxes" the expression result providing extended lifetime if necessary. |
| 54 | +The `cling::Value` can be used to communicate expression values from the |
| 55 | +interpreter to compiled code. |
| 56 | +
|
| 57 | +```cpp |
| 58 | +[cling]$ ++i |
| 59 | +(int) 3 |
| 60 | +[cling]$ V |
| 61 | +(cling::Value &) boxes [(int) 2] |
| 62 | +``` |
| 63 | + |
| 64 | +This mechanism introduces a delayed until runtime evaluation which enables some |
| 65 | +features increasing the dynamic look and feel of the C++ language. |
| 66 | + |
| 67 | +## The ROOT data analysis package |
| 68 | + |
| 69 | +The main tool for storage, research and visualization of scientific data in the |
| 70 | +field of high energy physics (HEP) is the specialized software package [ROOT](https://root.cern). |
| 71 | +ROOT is a set of interconnected components that assist scientists from data |
| 72 | +storage and research to their visualization when published in a scientific |
| 73 | +paper. ROOT has played a significant role in scientific discoveries such as |
| 74 | +gravitational waves, the great cavity in the Pyramid of Cheops, the discovery of |
| 75 | +the Higgs boson by the Large Hadron Collider. For the last 5 years, Cling has |
| 76 | +helped to analyze 1 EB physical data, serving as a basis for over 1000 |
| 77 | +scientific publications, and supports software run across a distributed million |
| 78 | +CPU core computing facility. |
| 79 | + |
| 80 | +ROOT uses Cling as a reflection information service for data serialization. The |
| 81 | +C++ objects are stored in a binary format, vertically. The content of a loaded |
| 82 | +data file is made available to the users and C++ objects become a first class |
| 83 | +citizen. |
| 84 | + |
| 85 | +A central component of ROOT enabled by Cling is eval-style programming. We use |
| 86 | +this in HEP to make it easy to inspect and use C++ objects stored by ROOT. |
| 87 | +Cling enables ROOT to inject available object names into the name lookup when |
| 88 | +a file is opened: |
| 89 | + |
| 90 | +```cpp |
| 91 | +[root] ntuple->GetTitle() |
| 92 | +error: use of undeclared identifier 'ntuple' |
| 93 | +[root] TFile::Open("tutorials/hsimple.root"); ntuple->GetTitle() // #1 |
| 94 | +(const char *) "Demo ntuple" |
| 95 | +[root] gFile->ls(); |
| 96 | +TFile** tutorials/hsimple.root Demo ROOT file with histograms |
| 97 | + TFile* tutorials/hsimple.root Demo ROOT file with histograms |
| 98 | + OBJ: TH1F hpx This is the px distribution : 0 at: 0x7fadbb84e390 |
| 99 | + OBJ: TNtuple ntuple Demo ntuple : 0 at: 0x7fadbb93a890 |
| 100 | + KEY: TH1F hpx;1 This is the px distribution |
| 101 | + [...] |
| 102 | + KEY: TNtuple ntuple;1 Demo ntuple |
| 103 | +[root] hpx->Draw() |
| 104 | + |
| 105 | +``` |
| 106 | +
|
| 107 | +The ROOT framework injects additional names to the name lookup on two stages. |
| 108 | +First, it builds an invalid AST by marking the occurrence of ntuple (#1), then |
| 109 | +it is transformed into |
| 110 | +`gCling->EvaluateT</*return type*/void>("ntuple->GetTitle()", /*context*/);` |
| 111 | +On the next stage, at runtime, ROOT opens the file, reads its preambule and |
| 112 | +injects the names via the external name lookup facility in clang. The |
| 113 | +transformation becomes more complex if `ntuple->GetTitle()` takes arguments. |
| 114 | +
|
| 115 | +{: style="{{ image_style }}"} |
| 116 | +
|
| 117 | +{% capture center_style %}<p style="text-align: center;">Figure 1. Interactive plot of the <i>px</i> distribution read from a root file.</p>{% endcapture %} |
| 118 | +{{ center_style | markdownify }} |
| 119 | +
|
| 120 | +
|
| 121 | +## C++ in Notebooks |
| 122 | +*Section Author:* **Sylvain Corlay, QuantStack** |
| 123 | +
|
| 124 | +The [Jupyter Notebook](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html) |
| 125 | +technology allows users to create and share documents that contain live code, |
| 126 | +equations, visualizations and narrative text. It enables data scientists to |
| 127 | +easily exchange ideas or collaborate by sharing their analyses in a |
| 128 | +straight-forward and reproducible way. Language agnosticism is a key design |
| 129 | +principle for the Jupyter project, and the Jupyter frontend communicates with |
| 130 | +the kernel (the part of the infrastructure that runs the code) through a |
| 131 | +well-specified protocol. Kernels have been developed for dozens of programming |
| 132 | +languages, such as R, Julia, Python, Fortran (through the LLVM-based LFortran |
| 133 | +project). |
| 134 | +
|
| 135 | +Jupyter's official C++ kernel relies on [Xeus](https://github.com/jupyter-xeus/xeus), |
| 136 | +a C++ implementation of the kernel protocol, and Cling. An advantage of using a |
| 137 | +reference implementation for the kernel protocol is that a lot of features come |
| 138 | +for free, such as rich mime type display, interactive widgets, auto-complete, |
| 139 | +and much more. |
| 140 | +
|
| 141 | +Rich mime-type rendering for user-defined types can be specified by providing |
| 142 | +an overload of `mime_bundle_repr` for the said type, which is picked up by |
| 143 | +argument dependent lookup. |
| 144 | +
|
| 145 | +{: style="{{ image_style }}"} |
| 146 | +
|
| 147 | +{% capture center_style %}<p style="text-align: center;">Figure 2. Inline rendering of images in JupyterLab for a user-defined image.</p>{% endcapture %} |
| 148 | +{{ center_style | markdownify }} |
| 149 | +
|
| 150 | +Possibilities with rich mime type rendering are endless, such as rich display of |
| 151 | +dataframes with HTML tables, or even mime types that are rendered in the |
| 152 | +front-end with JavaScript extensions. |
| 153 | +
|
| 154 | +An advanced example making use of rich rendering with Mathjax is the SymEngine |
| 155 | +symbolic computing library. |
| 156 | +
|
| 157 | +{: style="{{ image_style }}"} |
| 158 | +
|
| 159 | +{% capture center_style %}<p style="text-align: center;">Figure 3. Using rich mime type rendering in Jupyter with the Symengine package.</p>{% endcapture %} |
| 160 | +{{ center_style | markdownify }} |
| 161 | +
|
| 162 | +
|
| 163 | +
|
| 164 | +Xeus-cling comes along with an implementation of the Jupyter widgets protocol |
| 165 | +which enables bidirectional communication with the backend. |
| 166 | +
|
| 167 | +{: style="{{ image_style }}"} |
| 168 | +
|
| 169 | +
|
| 170 | +{% capture center_style %}<p style="text-align: center;">Figure 4. Interactive widgets in the JupyterLab with the C++ kernel.</p>{% endcapture %} |
| 171 | +{{ center_style | markdownify }} |
| 172 | +
|
| 173 | +
|
| 174 | +
|
| 175 | +More complex widget libraries have been enabled through this framework like |
| 176 | +[xleaflet](https://github.com/jupyter-xeus/xleaflet). |
| 177 | +
|
| 178 | +{: style="{{ image_style }}"} |
| 179 | +
|
| 180 | +{% capture center_style %}<p style="text-align: center;">Figure 5. Interactive GIS in C++ in JupyterLab with xleaflet.</p>{% endcapture %} |
| 181 | +{{ center_style | markdownify }} |
| 182 | +
|
| 183 | +Other features include rich HTML help for the standard library and third-party |
| 184 | +packages: |
| 185 | +
|
| 186 | +{: style="{{ image_style }}"} |
| 187 | +
|
| 188 | +{% capture center_style %}<p style="text-align: center;">Figure 6. Accessing cppreference for std::vector from JupyterLab by typing `?std::vector`.</p>{% endcapture %} |
| 189 | +{{ center_style | markdownify }} |
| 190 | +
|
| 191 | +The Xeus and Xeus-cling kernels were recently incorporated as subprojects to |
| 192 | +Jupyter, and are governed by its code of conduct and general governance. |
| 193 | +
|
| 194 | +Planned future developments for the xeus-cling kernel include: adding support |
| 195 | +for the Jupyter console interface, through an implementation of the Jupyter |
| 196 | +`is_complete` message, currently lacking; adding support for cling |
| 197 | +"dot commands" as Jupyter magics; and supporting the new debugger protocol that |
| 198 | +was recently added to the Jupyter kernel protocol, which will enable the use of |
| 199 | +the JupyterLab visual debugger with the C++ kernel. |
| 200 | +
|
| 201 | +Another tool that brings interactive plotting features to xeus-cling is xvega, |
| 202 | +which is at an early stage of development, produces vega charts that can be |
| 203 | +displayed in the notebook. |
| 204 | +
|
| 205 | +{: style="{{ image_style }}"} |
| 206 | +
|
| 207 | +{% capture center_style %}<p style="text-align: center;">Figure 7. The xvega plotting library in the xeus-cling kernel.</p>{% endcapture %} |
| 208 | +{{ center_style | markdownify }} |
| 209 | +
|
| 210 | +
|
| 211 | +## CUDA C++ |
| 212 | +*Section Author:* **Simeon Ehrig, HZDR** |
| 213 | +
|
| 214 | +The Cling CUDA extension brings the workflows of interactive C++ to GPUs without |
| 215 | +losing performance and compatibility to existing software. To execute CUDA C++ |
| 216 | +Code, Cling activates an extension in the compiler frontend to understand the |
| 217 | +CUDA C++ dialect and creates a second compiler instance that compiles the code |
| 218 | +for the GPU. |
| 219 | +
|
| 220 | +{: style="{{ image_style }}"} |
| 221 | +
|
| 222 | +{% capture center_style %}<p style="text-align: center;">Figure 8. CUDA/C++ information flow in Cling.</p>{% endcapture %} |
| 223 | +{{ center_style | markdownify }} |
| 224 | +
|
| 225 | +
|
| 226 | +Like the normal C++ mode, the CUDA C++ mode uses AST transformation to enable |
| 227 | +interactive CUDA C++ or special features as the Cling print system. In contrast |
| 228 | +to the normal Cling compiler pipeline used for the host code, the device |
| 229 | +compiler pipeline does not use all the transformations of the host pipeline. |
| 230 | +Therefore, the device pipeline has some special transformation. |
| 231 | +
|
| 232 | +```cpp |
| 233 | +[cling] #include <iostream> |
| 234 | +[cling] #include <cublas_v2.h> |
| 235 | +[cling] #pragma cling(load "libcublas.so") // link a shared library |
| 236 | +// set parameters |
| 237 | +// allocate memory |
| 238 | +// ... |
| 239 | +[cling] __global__ void init(float *matrix, int size){ |
| 240 | +[cling] ? int x = blockIdx.x * blockDim.x + threadIdx.x; |
| 241 | +[cling] ? if (x < size) |
| 242 | +[cling] ? matrix[x] = x; |
| 243 | +[cling] ? } |
| 244 | +[cling] |
| 245 | +[cling] // launching a function direct in the global space |
| 246 | +[cling] init<<<blocks, threads>>>(d_A, dim*dim); |
| 247 | +[cling] init<<<blocks, threads>>>(d_B, dim*dim); |
| 248 | +[cling] |
| 249 | +[cling] cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_N, dim, dim, dim, &alpha, d_A, dim, d_B, dim, &beta, d_C, dim); |
| 250 | +[cling] cublasGetVector(dim*dim, sizeof(h_C[0]), d_C, 1, h_C, 1); |
| 251 | +[cling] cudaGetLastError() |
| 252 | +(cudaError_t) (cudaError::cudaSuccess) : (unsigned int) 0 |
| 253 | +``` |
| 254 | + |
| 255 | +Like the normal C++ mode, the CUDA mode can be used in a Jupyter Notebook. |
| 256 | + |
| 257 | +{: style="{{ image_style }}"} |
| 258 | + |
| 259 | +{% capture center_style %}<p style="text-align: center;">Figure 9. CUDA/C++ information flow in Cling.</p>{% endcapture %} |
| 260 | +{{ center_style | markdownify }} |
| 261 | + |
| 262 | + |
| 263 | +A special property of Cling in CUDA mode is that the Cling application becomes a |
| 264 | +normal CUDA application at the time of the first CUDA API call. This enables the |
| 265 | +CUDA SDK with Cling. For example, you can use the CUDA profiler |
| 266 | +`nvprof ./cling -xcuda` to profile your interactive application. |
| 267 | +[This docker](https://hub.docker.com/r/sehrig/cling) container can be used to |
| 268 | +experiment with Cling's CUDA mode. |
| 269 | + |
| 270 | +Planned future developments for the CUDA mode include: Supporting of the |
| 271 | +complete current CUDA API; Redefining CUDA Kernels; Supporting other GPU SDK's |
| 272 | +like HIP (AMD) and SYCL (Intel). |
| 273 | + |
| 274 | +## Conclusion |
| 275 | + |
| 276 | +We see the use of Interactive C++ as an important tool to develop for |
| 277 | +researchers in the data science community. Cling has enabled ROOT to be the |
| 278 | +"go to" data analysis tool in the field of High Energy Physics for everything |
| 279 | +from efficient I/O to plotting and fitting. The interactive CUDA backend allows |
| 280 | +easy integration of research workflows and simpler communication between C++ and |
| 281 | +CUDA. As Jupyter Notebooks have become a standard way for data analysts to |
| 282 | +explore ideas, Xeus-cling ensures that great interactive C++ ingredients are |
| 283 | +available in every C++ notebook. |
| 284 | + |
| 285 | +In the next blog post we will focus on Cling enabling features beyond |
| 286 | +interactive C++, and in particular language interoperability. |
| 287 | + |
| 288 | + |
| 289 | +## Acknowledgements |
| 290 | + |
| 291 | +The author would like to thank Sylvain Corlay, Simeon Ehrig, David Lange, |
| 292 | +Chris Lattner, Javier Lopez Gomez, Wim Lavrijsen, Axel Naumann, Alexander Penev, |
| 293 | +Xavier Valls Pla, Richard Smith, Martin Vassilev, who contributed to this post. |
| 294 | + |
| 295 | +You can find out more about our activities at |
| 296 | + [https://root.cern/cling/](https://root.cern/cling/) and |
| 297 | + [https://compiler-research.org](https://compiler-research.org). |
| 298 | + |
| 299 | + |
| 300 | +[LLVM Blog]: https://blog.llvm.org/posts/2020-12-21-interactive-cpp-for-data-science/ |
0 commit comments