Skip to content

Commit 43c1794

Browse files
committed
added 5 more blog posts
1 parent 7747fb6 commit 43c1794

18 files changed

+1260
-0
lines changed

_posts/2020-11-30-interactive-cpp-with-cling.md

+395
Large diffs are not rendered by default.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,300 @@
1+
---
2+
title: "Interactive C++ for Data Science"
3+
layout: post
4+
excerpt: "This post will discuss some applications of Cling
5+
developed to support data science researchers. In particular, interactively
6+
probing data and interfaces makes complex libraries and complex data more
7+
accessible to users. We aim to demonstrate some of Cling’s features at scale;
8+
Cling’s eval-style programming support; projects related to Cling; and show
9+
interactive C++/CUDA."
10+
sitemap: false
11+
permalink: blogs/interactive-cpp-for-data-science/
12+
author: Vassil Vassilev, David Lange, Simeon Ehrig, Sylvain Corlay
13+
---
14+
15+
{% capture image_style %}
16+
max-width: 70%;
17+
display: block;
18+
margin: 0 auto;
19+
{% endcapture %}
20+
21+
> Note: This article was first published on the [LLVM Blog].
22+
23+
# Interactive C++ for Data Science
24+
25+
In our previous blog post ["Interactive C++ with Cling"](/_posts/2020-11-30-interactive-cpp-with-cling.md)
26+
we mentioned that exploratory programming is an effective way to reduce the
27+
complexity of the problem. This post will discuss some applications of Cling
28+
developed to support data science researchers. In particular, interactively
29+
probing data and interfaces makes complex libraries and complex data more
30+
accessible to users. We aim to demonstrate some of Cling’s features at scale;
31+
Cling’s eval-style programming support; projects related to Cling; and show
32+
interactive C++/CUDA.
33+
34+
## Eval-style programming
35+
36+
A Cling instance can access itself through its runtime. The example creates a
37+
`cling::Value` to store the execution result of the incremented variable `i`.
38+
That mechanism can be used further to support dynamic scopes extending the name
39+
lookup at runtime.
40+
41+
```cpp
42+
[cling]$ #include <cling/Interpreter/Value.h>
43+
[cling]$ #include <cling/Interpreter/Interpreter.h>
44+
[cling]$ int i = 1;
45+
[cling]$ cling::Value V;
46+
[cling]$ gCling->evaluate("++i", V);
47+
[cling]$ i
48+
(int) 2
49+
[cling]$ V
50+
(cling::Value &) boxes [(int) 2]
51+
```
52+
53+
`V` "boxes" the expression result providing extended lifetime if necessary.
54+
The `cling::Value` can be used to communicate expression values from the
55+
interpreter to compiled code.
56+
57+
```cpp
58+
[cling]$ ++i
59+
(int) 3
60+
[cling]$ V
61+
(cling::Value &) boxes [(int) 2]
62+
```
63+
64+
This mechanism introduces a delayed until runtime evaluation which enables some
65+
features increasing the dynamic look and feel of the C++ language.
66+
67+
## The ROOT data analysis package
68+
69+
The main tool for storage, research and visualization of scientific data in the
70+
field of high energy physics (HEP) is the specialized software package [ROOT](https://root.cern).
71+
ROOT is a set of interconnected components that assist scientists from data
72+
storage and research to their visualization when published in a scientific
73+
paper. ROOT has played a significant role in scientific discoveries such as
74+
gravitational waves, the great cavity in the Pyramid of Cheops, the discovery of
75+
the Higgs boson by the Large Hadron Collider. For the last 5 years, Cling has
76+
helped to analyze 1 EB physical data, serving as a basis for over 1000
77+
scientific publications, and supports software run across a distributed million
78+
CPU core computing facility.
79+
80+
ROOT uses Cling as a reflection information service for data serialization. The
81+
C++ objects are stored in a binary format, vertically. The content of a loaded
82+
data file is made available to the users and C++ objects become a first class
83+
citizen.
84+
85+
A central component of ROOT enabled by Cling is eval-style programming. We use
86+
this in HEP to make it easy to inspect and use C++ objects stored by ROOT.
87+
Cling enables ROOT to inject available object names into the name lookup when
88+
a file is opened:
89+
90+
```cpp
91+
[root] ntuple->GetTitle()
92+
error: use of undeclared identifier 'ntuple'
93+
[root] TFile::Open("tutorials/hsimple.root"); ntuple->GetTitle() // #1
94+
(const char *) "Demo ntuple"
95+
[root] gFile->ls();
96+
TFile** tutorials/hsimple.root Demo ROOT file with histograms
97+
TFile* tutorials/hsimple.root Demo ROOT file with histograms
98+
OBJ: TH1F hpx This is the px distribution : 0 at: 0x7fadbb84e390
99+
OBJ: TNtuple ntuple Demo ntuple : 0 at: 0x7fadbb93a890
100+
KEY: TH1F hpx;1 This is the px distribution
101+
[...]
102+
KEY: TNtuple ntuple;1 Demo ntuple
103+
[root] hpx->Draw()
104+
105+
```
106+
107+
The ROOT framework injects additional names to the name lookup on two stages.
108+
First, it builds an invalid AST by marking the occurrence of ntuple (#1), then
109+
it is transformed into
110+
`gCling->EvaluateT</*return type*/void>("ntuple->GetTitle()", /*context*/);`
111+
On the next stage, at runtime, ROOT opens the file, reads its preambule and
112+
injects the names via the external name lookup facility in clang. The
113+
transformation becomes more complex if `ntuple->GetTitle()` takes arguments.
114+
115+
![Figure 1](/images/blog/cling-2020-12-21-figure1.png){: style="{{ image_style }}"}
116+
117+
{% capture center_style %}<p style="text-align: center;">Figure 1. Interactive plot of the <i>px</i> distribution read from a root file.</p>{% endcapture %}
118+
{{ center_style | markdownify }}
119+
120+
121+
## C++ in Notebooks
122+
*Section Author:* **Sylvain Corlay, QuantStack**
123+
124+
The [Jupyter Notebook](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html)
125+
technology allows users to create and share documents that contain live code,
126+
equations, visualizations and narrative text. It enables data scientists to
127+
easily exchange ideas or collaborate by sharing their analyses in a
128+
straight-forward and reproducible way. Language agnosticism is a key design
129+
principle for the Jupyter project, and the Jupyter frontend communicates with
130+
the kernel (the part of the infrastructure that runs the code) through a
131+
well-specified protocol. Kernels have been developed for dozens of programming
132+
languages, such as R, Julia, Python, Fortran (through the LLVM-based LFortran
133+
project).
134+
135+
Jupyter's official C++ kernel relies on [Xeus](https://github.com/jupyter-xeus/xeus),
136+
a C++ implementation of the kernel protocol, and Cling. An advantage of using a
137+
reference implementation for the kernel protocol is that a lot of features come
138+
for free, such as rich mime type display, interactive widgets, auto-complete,
139+
and much more.
140+
141+
Rich mime-type rendering for user-defined types can be specified by providing
142+
an overload of `mime_bundle_repr` for the said type, which is picked up by
143+
argument dependent lookup.
144+
145+
![Figure 2](/images/blog/cling-2020-12-21-figure2.png){: style="{{ image_style }}"}
146+
147+
{% capture center_style %}<p style="text-align: center;">Figure 2. Inline rendering of images in JupyterLab for a user-defined image.</p>{% endcapture %}
148+
{{ center_style | markdownify }}
149+
150+
Possibilities with rich mime type rendering are endless, such as rich display of
151+
dataframes with HTML tables, or even mime types that are rendered in the
152+
front-end with JavaScript extensions.
153+
154+
An advanced example making use of rich rendering with Mathjax is the SymEngine
155+
symbolic computing library.
156+
157+
![Figure 3](/images/blog/cling-2020-12-21-figure3.png){: style="{{ image_style }}"}
158+
159+
{% capture center_style %}<p style="text-align: center;">Figure 3. Using rich mime type rendering in Jupyter with the Symengine package.</p>{% endcapture %}
160+
{{ center_style | markdownify }}
161+
162+
163+
164+
Xeus-cling comes along with an implementation of the Jupyter widgets protocol
165+
which enables bidirectional communication with the backend.
166+
167+
![Figure 4](/images/blog/cling-2020-12-21-figure4.gif){: style="{{ image_style }}"}
168+
169+
170+
{% capture center_style %}<p style="text-align: center;">Figure 4. Interactive widgets in the JupyterLab with the C++ kernel.</p>{% endcapture %}
171+
{{ center_style | markdownify }}
172+
173+
174+
175+
More complex widget libraries have been enabled through this framework like
176+
[xleaflet](https://github.com/jupyter-xeus/xleaflet).
177+
178+
![Figure 5](/images/blog/cling-2020-12-21-figure5.gif){: style="{{ image_style }}"}
179+
180+
{% capture center_style %}<p style="text-align: center;">Figure 5. Interactive GIS in C++ in JupyterLab with xleaflet.</p>{% endcapture %}
181+
{{ center_style | markdownify }}
182+
183+
Other features include rich HTML help for the standard library and third-party
184+
packages:
185+
186+
![Figure 6](/images/blog/cling-2020-12-21-figure6.png){: style="{{ image_style }}"}
187+
188+
{% capture center_style %}<p style="text-align: center;">Figure 6. Accessing cppreference for std::vector from JupyterLab by typing `?std::vector`.</p>{% endcapture %}
189+
{{ center_style | markdownify }}
190+
191+
The Xeus and Xeus-cling kernels were recently incorporated as subprojects to
192+
Jupyter, and are governed by its code of conduct and general governance.
193+
194+
Planned future developments for the xeus-cling kernel include: adding support
195+
for the Jupyter console interface, through an implementation of the Jupyter
196+
`is_complete` message, currently lacking; adding support for cling
197+
"dot commands" as Jupyter magics; and supporting the new debugger protocol that
198+
was recently added to the Jupyter kernel protocol, which will enable the use of
199+
the JupyterLab visual debugger with the C++ kernel.
200+
201+
Another tool that brings interactive plotting features to xeus-cling is xvega,
202+
which is at an early stage of development, produces vega charts that can be
203+
displayed in the notebook.
204+
205+
![Figure 7](/images/blog/cling-2020-12-21-figure7.png){: style="{{ image_style }}"}
206+
207+
{% capture center_style %}<p style="text-align: center;">Figure 7. The xvega plotting library in the xeus-cling kernel.</p>{% endcapture %}
208+
{{ center_style | markdownify }}
209+
210+
211+
## CUDA C++
212+
*Section Author:* **Simeon Ehrig, HZDR**
213+
214+
The Cling CUDA extension brings the workflows of interactive C++ to GPUs without
215+
losing performance and compatibility to existing software. To execute CUDA C++
216+
Code, Cling activates an extension in the compiler frontend to understand the
217+
CUDA C++ dialect and creates a second compiler instance that compiles the code
218+
for the GPU.
219+
220+
![Figure 8](/images/blog/cling-2020-12-21-figure8.png){: style="{{ image_style }}"}
221+
222+
{% capture center_style %}<p style="text-align: center;">Figure 8. CUDA/C++ information flow in Cling.</p>{% endcapture %}
223+
{{ center_style | markdownify }}
224+
225+
226+
Like the normal C++ mode, the CUDA C++ mode uses AST transformation to enable
227+
interactive CUDA C++ or special features as the Cling print system. In contrast
228+
to the normal Cling compiler pipeline used for the host code, the device
229+
compiler pipeline does not use all the transformations of the host pipeline.
230+
Therefore, the device pipeline has some special transformation.
231+
232+
```cpp
233+
[cling] #include <iostream>
234+
[cling] #include <cublas_v2.h>
235+
[cling] #pragma cling(load "libcublas.so") // link a shared library
236+
// set parameters
237+
// allocate memory
238+
// ...
239+
[cling] __global__ void init(float *matrix, int size){
240+
[cling] ? int x = blockIdx.x * blockDim.x + threadIdx.x;
241+
[cling] ? if (x < size)
242+
[cling] ? matrix[x] = x;
243+
[cling] ? }
244+
[cling]
245+
[cling] // launching a function direct in the global space
246+
[cling] init<<<blocks, threads>>>(d_A, dim*dim);
247+
[cling] init<<<blocks, threads>>>(d_B, dim*dim);
248+
[cling]
249+
[cling] cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_N, dim, dim, dim, &alpha, d_A, dim, d_B, dim, &beta, d_C, dim);
250+
[cling] cublasGetVector(dim*dim, sizeof(h_C[0]), d_C, 1, h_C, 1);
251+
[cling] cudaGetLastError()
252+
(cudaError_t) (cudaError::cudaSuccess) : (unsigned int) 0
253+
```
254+
255+
Like the normal C++ mode, the CUDA mode can be used in a Jupyter Notebook.
256+
257+
![Figure 9](/images/blog/cling-2020-12-21-figure9.gif){: style="{{ image_style }}"}
258+
259+
{% capture center_style %}<p style="text-align: center;">Figure 9. CUDA/C++ information flow in Cling.</p>{% endcapture %}
260+
{{ center_style | markdownify }}
261+
262+
263+
A special property of Cling in CUDA mode is that the Cling application becomes a
264+
normal CUDA application at the time of the first CUDA API call. This enables the
265+
CUDA SDK with Cling. For example, you can use the CUDA profiler
266+
`nvprof ./cling -xcuda` to profile your interactive application.
267+
[This docker](https://hub.docker.com/r/sehrig/cling) container can be used to
268+
experiment with Cling's CUDA mode.
269+
270+
Planned future developments for the CUDA mode include: Supporting of the
271+
complete current CUDA API; Redefining CUDA Kernels; Supporting other GPU SDK's
272+
like HIP (AMD) and SYCL (Intel).
273+
274+
## Conclusion
275+
276+
We see the use of Interactive C++ as an important tool to develop for
277+
researchers in the data science community. Cling has enabled ROOT to be the
278+
"go to" data analysis tool in the field of High Energy Physics for everything
279+
from efficient I/O to plotting and fitting. The interactive CUDA backend allows
280+
easy integration of research workflows and simpler communication between C++ and
281+
CUDA. As Jupyter Notebooks have become a standard way for data analysts to
282+
explore ideas, Xeus-cling ensures that great interactive C++ ingredients are
283+
available in every C++ notebook.
284+
285+
In the next blog post we will focus on Cling enabling features beyond
286+
interactive C++, and in particular language interoperability.
287+
288+
289+
## Acknowledgements
290+
291+
The author would like to thank Sylvain Corlay, Simeon Ehrig, David Lange,
292+
Chris Lattner, Javier Lopez Gomez, Wim Lavrijsen, Axel Naumann, Alexander Penev,
293+
Xavier Valls Pla, Richard Smith, Martin Vassilev, who contributed to this post.
294+
295+
You can find out more about our activities at
296+
[https://root.cern/cling/](https://root.cern/cling/) and
297+
[https://compiler-research.org](https://compiler-research.org).
298+
299+
300+
[LLVM Blog]: https://blog.llvm.org/posts/2020-12-21-interactive-cpp-for-data-science/

0 commit comments

Comments
 (0)