The generalized least squares (GLS) method is a useful and well-established tool in the field of nuclear data evaluation. In order to use GLS, all relationships between variables, e.g., cross sections and parameters of nuclear models, must be modeled as linear relationships and the joint distribution must be multivariate normal. These assumptions are sometimes too crude. For instance, some experiments measure the ratio of two cross sections. If the value of the ratio follows a normal distribution, the individual cross sections will not. Also, nuclear models are usually non-linear and the use of a linear approximation in the GLS method may lead to misleading results. For this reason, other evaluation techniques, such as Bayesian Monte Carlo, UMC-B, and BFMC, have been developed to overcome some limitations of the GLS method. Like GLS, they are Bayesian methods. However, unlike GLS they don't linearize the nuclear model but employ importance sampling to draw samples from the posterior distribution.
Even though the mentioned developments are important steps towards more robust and reliable evaluations, there are still many unsolved problems left. For instance, scaling up Monte Carlo methods in terms of the number of isotopes and the number of included experimental data points is difficult because it is not straight-forward to efficiently target the posterior distribution. Sometimes the nuclear models cannot describe the experimental data well, even if model parameters are fitted to the very same data. Therefore evaluation methods must account for the possibility that models are imperfect. Also the experimental data can be wrong due to various reasons, and an evaluation procedure should ideally detect such cases and reasonably handle them. It can be foreseen that new research in the field of uncertainty quantification, Bayesian statistics, and machine learning will help addressing these issues and eventually change the way we deal with nuclear data.
This portal for nuclear data is still in its initial stage and under development but its aim is clear: It should provide an overview of existing evaluation methods, inform about recent developments concerning evaluation methodology, and provide interactive demonstrations of algorithms. The term evaluation methodology is here understood in a broad sense and also encompasses mathematical methods that help experimenters analyzing their experiments. The portal should also give pointers to useful software and other ressources relevant for nuclear data. Ideally in the future, it will grow and get more useful thanks to a collaborative effort of the nuclear physics community.
If you are aware of a useful web resource or insightful paper related to nuclear data evaluation which you want to see linked here, write an email to georg.schnabel@nucleardata.com.
Especially when it comes to the study of novel nuclear data evaluation methods, the best way to develop an intuition and understanding of their working is to play with them using different assumptions and to apply them in various scenarios. Better yet if everyone is empowered to explore interactively the behavior of evaluation algorithms without the need to install additional software. Removing technical obstacles between the user and the application of algorithms facilitates the involvement of a broader expert audience. This in turn makes it more likely that shortcomings and unfavorable features of algorithms are discovered before they are employed in "production" evaluations. Therefore, the interactive exploration potentially helps in the conception of better algorithms. In this section you find interactive demonstrations of algorithms, which run in your browser. There is no need to install any additional software.
In experimental nuclear physics during data analysis, one is facing the task to split the measured signal of a particle detector in several contributing components in order to extract the relevant one. The relevant contribution is given by the particles being produced in a controlled way within the experiment whereas the other contributions are associated with particles entering the detector from the outside (e.g., from space) or particles that are produced in the experiment but whose impact on the detector output is undesired. This interactive demonstration showcases the separation of two bivariate normal peaks from uniform background noise either by the expectation-maximization algorithm or Gibbs sampling. A nice feature of the Gibbs sampling approach is that uncertainties of all quantities involved, i.e., component proportions, centers, and covariance matrices, are also available. Gibbs sampling is a specific realization of a Markov Chain Monte Carlo (MCMC) algorithm.
One common approach to perform nuclear data evaluation is to use a special case of the Kalman filter, which coincides with the Bayesian version of the Generalized Least Squares (GLS) method. The Kalman filter (in its basic form) assumes a linear mapping between the state and the observation and therefore nuclear models have to be replaced by a first-order Taylor approximation for the procedure. The linear approximation of non-linear nuclear models can lead to strongly distorted results. This demonstration enables the interactive exploration of the magnitude of such distortions at the example of the nuclear model code TALYS and the neutron-induced total cross section of 181Ta.
The comparison of experimental data collected over decades often reveals discrepancies between experiments. One possible approach to deal with this issue in an automated manner is to employ Bayesian hierarchical modelling which enables treating the experimental uncertainties and correlations as unknowns (or adjustable parameters). This route has been explored in this paper. The underlying idea is that all experimental estimates are valid and only the associated uncertainties have been occassionally misjudged. However, it is also possible to take the view that some experimental datasets are correct and others are not. This perspective can be modelled using mixture models. As nuclear data evaluation is routinely done using the Generalized Least Squares method and hence based on the assumption of a multivariate normal distribution, the most straight-forward extension is to employ a multivariate normal mixture model. The linked interactive demonstration allows to explore this approach with consistent and inconsistent datasets under the assumption of either a linear model, Gaussian process, or custom model as prior.
This Python package enables reading and writing ENDF files, see the ENDF-6 formats manual for details. An ENDF file can be read in as a nested dictionary, where the field names correspond to the quantity names defined in the ENDF manual. Nested dictionaries of appropriate structure can also be written out as an ENDF file. Therefore, by using the json package, ENDF files can also be converted to JSON and back from JSON to ENDF files. This packages leverages a formalized version of the ENDF format specification language used in the ENDF manual, see this presentation for some examples of how this formalization was achieved and a more extensive presentation with some application examples.
This parser written in Python allows to extract the information from EXFOR master files, see e.g. below or for the up-to-date files the IAEA-NDS GitHub repository. The extracted information is then available as a nested dictionary. With additional packages, such as the json package, EXFOR files can also be converted to other container formats. It is a contribution to WPEC subgroup 50 that has the goal to develop an automatically readable, comprehensive, and curated experimental reaction database. A presentation was given at a meeting of SG 50 outlining the design philosophy of this parser. As it covers all corner cases of so-called pointers, a feature of the EXFOR format, it is superior to the exforParser R package listed below.
The JSON format is a lightweight data-interchange format which is easy for humans to read and write and easy for machines to parse and generate. Most programming languages provide facilities to conveniently deal with it. The availability of EXFOR data in the JSON format is therefore helpful for nuclear data evaluators and in general everyone using EXFOR data in their work. For this reason, the JSON format has also been added to the IAEA web interface to EXFOR as an output option. Here an EXFOR to JSON converter written in the programming language R is provided. This EXFOR parser follows the philosophy of minimal structural change. Fields in the original EXFOR data are fields in the outputted JSON object keeping modifications of their content to a minimum. The idea is that changes in structure of the JSON object can be easily effected using a high-level language with rich string manipulation facilities, such as Python.
The ENDF format is a commonly used format to store and distribute nuclear data in files. It is made up of MF sections which are subdivided in MT sections. The information of these sections with exception of the free text part in MF1/MT451 is organized in a matrix with six fields per row and each field being eleven characters long. Again with the exception of MF1/MT451, all fields contain numbers. Each section begins with a header which is typically followed by a sequence of numbers. As the sequences are spread over many lines due to a line break every six elements, conventional line-based tools to detect and visualize differences are not well suited. Any insertion or deletion of a number causes a shift of all number in subsequent rows in the same section. ENDF-dtwdiff is a comparison tool designed to take into account the specific structure of ENDF files. It uses dynamic time warping to detect insertions and deletions and displays the differences in a meaningful way side by side.
Bayesian networks are graphical models to represent the probabilistic relationships between random variables in the Bayesian framework. Interpreting and phrasing a probabilistic model in terms of nodes and links between nodes permits the rapid development of sophisticated probabilistic models. As nuclear data evaluation is at its core about the modeling of the interactions between components of experiments and how experiments relate to nuclear model predictions and to a conceived truth, Bayesian networks are very suited to model these interactions to make them accessible to Bayesian inference. This paper on arXiv explains the mathematical details in the context of nuclear data evaluation and elaborates on three Bayesian network examples to perform evaluations. An R package has been developed along that permits the creation of Bayesian networks and inference in them and contains the scripts to reproduce the examples of the paper.
Automation, reproducibility, and transparency are important topics in the ongoing discussions about how to improve the process of nuclear data evaluation. Over the last few years, the Docker technology is rapidly gaining momentum in the IT industry and it can also help nuclear data evaluation to get more transparent, automated, and reproducible. The Docker application helps to manage and operate with so-called Docker images and Docker containers. Loosely speaking, a Docker container can be regarded as a light-weight virtual machine which can run applications isolated from the rest of the computer system. Docker images are templates to create Docker containers. The advantage of using Docker images is that an application can be bundled with all its dependencies. The installation of such a bundle becomes very easy thanks to the Docker application. The setup of complex evaluation pipelines that depend on a lot of components, such as databases, libraries, interpreters for specific languages, nuclear physics codes, etc. becomes trivial. This circumstance facilitates sharing of data and code between researchers. Collaborating is facilitated and consequently the whole research and evaluation process accelerated. This section provides Dockerfiles which are scripts to create Docker images relevant and helpful for nuclear data evaluation.
Installation instructions for Docker Community Edition on Windows, Linux, and Mac can be found in the official Docker documentation. Here are direct links for Windows, Mac, Ubuntu, Debian, Fedora, and CentOS. At the time of writing the Docker Community Edition is provided under the Apache License 2.0, which is a permissive license. The up-to-date license information can be found in the official Docker GitHub repository.
During evaluation work many choices have to be made concerning the selection of experimental data, their uncertainties, model parameters, and statistical algorithms to adjust model parameters based on the information from experiments. It is very difficult to convey all information in a technical report or paper that would be required to reproduce an evaluation and reproducibility is important in nuclear data evaluation. Therefore, all choices made in an evaluation should ideally be implemented as a sequence of scripts, also referred to as a pipeline. In this way, other people can comparatively easily reproduce the evaluation by rerunning the pipeline, scrutinize and test the impact of assumptions or do an improved evaluation using an available evaluation pipeline as a starting point. A prototype of an evaluation pipeline is provided here, which contains several innovations in evaluation methodology, such as the automatic correction of experimental uncertainties using marginal likelihood optimization (MLO), Gaussian process priors on energy-dependent model parameters, and the optimization of model parameters using a customized Levenberg-Marquardt algorithm, which takes into account prior knowledge and the non-linearity of the physics model. The pipeline can be used in combination with a cluster to perform a full-scale evaluation. In its current form, it implements the evaluation of neutron-induced cross sections of Fe56 and has been successfully employed to adjust about 150 parameters of the nuclear models code TALYS after sensitivity analysis of about thousand model parameters.
The EXFOR library [16] is an essential resource for nuclear data evaluation. Convenient programmatic access from the popular high-level languages is therefore one key element for automated and reproducible evaluation pipelines. The nuclear data section of the IAEA and some other NRDCs provide computational formats, such as the C4 format, which are convenient for automated processing. Complementary to these formats, the complete EXFOR library has been converted to a MongoDB database. This database belongs to the class of document-oriented databases, a subclass of a so-called NoSQL database. The MongoDB database software features an expressive query language and MonoGDB databases can be accessed from a variety of high-level programming languages including C, C++, Python, R, Java, and Perl. The following resources give guidance in the installation and use of the MongoDB EXFOR database on your local computer or cluster:
Important disclaimer: This database is a prototype enabling access to the EXFOR library via the MongoDB API. It is not fully up-to-date with the EXFOR library maintained by the Nuclear Reaction Data Centres (NRDC) and may contain other shortcomings. Therefore, neither the creator of this Docker image nor the NRDCs take any responsibility for the completeness or correctness of the information in this database.
The EXFOR library has also been made available as a CouchDB database. The reason being that CouchDB database software comes with a more permissive license than the MongoDB database software, which was originally used to make accessible the EXFOR library as a NoSQL database. Technical details aside, CouchDB provides about the same functionality as MongoDB. It also enables the storage of the EXFOR subentries in the JSON format.
Important disclaimer: This database is a prototype enabling access to the EXFOR library via the CouchDB API. It is not fully up-to-date with the EXFOR library maintained by the Nuclear Reaction Data Centres (NRDC) and may contain other shortcomings. Therefore, neither the creator of this Docker image nor the NRDCs take any responsibility for the completeness or correctness of the information in this database.