Technical

Welcome to our Technical section!

 

Efpalinos Publishing promotes the use of open-source technologies and tools, while it applies digital humanities methods and techniques in order to help scholars to answer research questions and, thus, transform and create new knowledge, but also for facilitating the publication, dissemination and communication of scholarly works.

Tools & Systems

Website

 

WORDPRESS

WordPress is a web content management system and it is available as free and open source software.

Find out more about WordPress here

 

 

WOOCOMMERCE

For the eBookstore of Efpalinos Publishing the WooCommerce is utilised, which is the open-source ecommerce platform for WordPress.

Find out more about WooCommerce here

 

 

UNIVERSAL VIEWER

Open and public domain digitised collection items, from the Efpalinos Publishing corpus, are displayed on the Image Viewer. The viewer, built with the ‘Universal Viewer‘ software and community and using the IIIF standard for image interoperability, makes several new features available to readers.

 

New features include full-text search within collection items (where automatically transcribed text is available), navigation by thumbnail images, the ability to copy the direct link to share or bookmark an item. Usage terms are included in an expanded ‘About this item’ panel, and you can download selected items as images or PDFs. Some items can also be embedded in web pages, such as blog posts, teaching resources or news articles. Better zoom and rotate functions improve the experience of large-format items, and the viewer will soon provide a better ‘responsive’ experience for mobile and tablet users.

Find out more about Universal Viewer here

 

 

Repository

 

REASONABLE GRAPH

Reasonable Graph platform offers a generic web solution for the management of digital and physical collections directly as a semantic and linked open data set. Inherently it manages ontologies, specified on a scientific field or based on prototypes (e.g. FRBR-FRAD-FRSAD/FRBR LRM, BIB-FRAME, FRBR-OO, e.t.c.).

Find out more about Reasonable Graph here

 

Metadata, Vocabularies & Data Models

Metadata

Metadata often is described as “data about data”. Metadata provides additional information about an object or resource, either physical or digital, and with different levels of granularity, from a single property of an item, to collection and dataset levels, even to all datasets of organisation. Metadata plays crucial role in making the meaning and structure of data to be understandable by both humans and machines, but also it helps to clarify legal issues and licence terms, data provenance, data quality, data formats, data access, location, among many other. Well described, rich and high-quality metadata records show their power in making digital resources discoverable, citable, reusable and accessible for the long term. The provision of metadata is a fundamental requirement for web publishing, as data will not be discoverable or reusable by others if sufficient metadata is not provided. In addition, multilingual information readable by both humans and machines is highly encouraged and specifically in languages that the target audience will understand.

 

Metadata interoperability plays essential role in effective content sharing and interchange. Beyond syntactic interoperability and the ability of machines to share and exchange information based on the syntax, with semantic interoperability machines understand and process also the semantics, the meaning of data, achieving ultimately meaningful and accurate interpretation of exchanged data across different computer systems and programs. This is quite crucial in arts and humanities because digital resources need to be processed, interchanged and represented by revealing explicitly and precisely complex layers of meanings, key aspects of structure, usefulness and distinct values, in spite of the interdisciplinary and multilingual nature of the material and the rich variation of primary sources, data types and formats which are usually scattered across different organisations and information systems. However, semantic interoperability depends upon knowing the data formats and their representations, as well as the vocabularies used for data items, so that machines can interpret, process and combine data automatically, efficiently and meaningfully with other data. Metadata therefore provides the additional information to machines so to be able to understand and process the meaning with the data and link each data element to a controlled shared vocabulary, that is based on an ontology. Thus, by making machines capable to interpret, process and integrate the content of data with other data across systems, semantic annotation, known also as semantic enrichment, enables fundamentally machine computable logic, inferencing, classifying, searching, filtering, linking, knowledge discovery among many other operations.

 

Besides interoperability and the understanding of exchanged content by machines, metadata facilitates significantly organisation, indexing, discovery, access, analysis and reuse of online resources, while it is key for processing reliable and fast results. Metadata is very important for the discovery of non-textual digital resources, such as images, datasets, multimedia content, because these are not machine readable. In particular, discovery of digital material based on metadata prevails within the cultural heritage domain, because many resources are not available as full-text (Freire et al., 2017). Metadata can also be of different types that can be classified in different taxonomies, with different grouping criteria. Anyhow, metadata could provide descriptive information, such as name, location, subject, date and time, technical about for instance the digitisation tool and data format, legal information as regards access, rights and reuse permissions but it could also give information about data provenance or for preservation purposes so the resource could be accessed and used in the long-term.

 

Vocabularies

Controlled vocabularies are used in (meta)data for improving information retrieval and play crucial role in digital humanities for semantic interoperability, data integration and homogenisation, knowledge management and discovery. Controlled vocabularies and thesauri are standardised and organised lists of predefined terms, including words, phrases, and hierarchies, which provide a consistent way to describe data and organise knowledge and so might be used to search an index, abstract or information database.

 

Common types of controlled vocabularies include subject headings lists, authority files, thesauri and other types of knowledge organisation systems. Controlled vocabularies could be terms alphabetically listed or taxonomies, classified things and concepts, that usually have a hierarchical structure of broader and narrower terms. With folksonomy, also known as social or collaborative tagging, social indexing or classification, online tags are assigned by the users of system and not by the owner of the content as with taxonomic classification. Moreover, thesauri are lists of grouped words based on the similarity of meaning, including synonyms, related terms, scope and editorial notes, term history, alternate languages, or numerical codes. A gazetteer is a list of places and a digital gazetteer, in particular, includes names, coordinates and often feature types, while most of the spatial search and visualization on the Web are based on digital gazetteers. There are many standardised and community-agreed controlled vocabularies published on the web that provide links to abstract concepts and other terms with unique identifiers.

 

Controlled vocabularies are highly used in library and information science domain for organising knowledge and subsequent retrieval, while they play increasingly important role in digital humanities by enhancing significantly, semantic interoperability. They are even more important for the history domain, as very often scholars deal with ambiguity and generally with terms and concepts whose meaning is not clear and therefore it is very common to use different words, even in the same language, in order to describe the same person, event, place and other phaenomena. This results in great difficulties for the machines, if one wants to find a person, for instance, whose name is written in different or obsolete languages and in various ways, or even when his name is implicitly mentioned. Humanities data is full with complexities, inconsistencies, irregularities, errors and messy information, missing and unknown values, misspellings, but also with abbreviations, synonyms, polysemy, homophony and with various elements that machines cannot understand and interpret their meaning as used, inherently, in human languages.

 

Data Models & Ontologies

Models and data modeling are umbrella terms, however, in the text encoding domain and, generally, in digital humanities play a central methodological and foundation role (Ciotti & Tomasi, 2016). Database design and text encoding could be categorised in data modeling, an activity that focus on the intellectual aspects of data and regards the conceptual representation of complex entities and relationships enabling therefore interpretation. Though data modeling activity deals with describing structures and representing data observed in analysis, it usually provides the hermeneutic perspective in analysing entities and relationships. How data is modeled depends on the research questions and objects, as well as on the conceptual and logical models, according to which the identified entities and their relationships are expressed and strictly defined with constraints upon which the data operate, but also on how structures of data are described, along with the tool and its capabilities on implementing the conceptualised data, so it can be searchable and interrelated.

 

Despite the structural properties and capabilities of the tools, if data needs to be shared across different systems, then personal and informal data models cannot be interpreted by different machines, even in cases when the same object is represented. As systems of consistency constraints and by maintaining the the meaning of data and allowing data heterogeneity and data sharing across different computer systems, formal data models and ontologies are key for semantic interoperability. Applying formal data models in order to document data and map concepts according to a common semantic reference, plays crucial role in arts, humanities and cultural heritage, because data in these domains bring many challenges to machines. Humanities data is scattered across different systems and formats, of multilingual and interdisciplinary nature, with vague and ambiguous meanings, inconsistencies, irregularities, contradictions and with complex relationships and structures that machines cannot represent and interrelate as perceived by the human mind and expressed in natural language. In addition, as Flanders & Jannidis (2018) indicate, data in humanities is strongly layered, that is to say, it carries a history that is integral part of its identity and therefore ‘models in humanities need to represent not only the history of the artifact itself but also the histories of the ways in which has been described and contextualised’. By using a common semantic framework for clarifying abstract concepts and determining strictly relationships and constraints, machines are capable to interpret and represent unambiguously, precisely and accurately exchanged data with complex layers of meanings, key aspects of structure, distinct values and histories, among other unique intellectual features of the research objects.

 

The term ontology could refer to controlled vocabularies and theusari, but also to formal ontologies, which are strictly defined formalisations of a conceptualisation, by clarifying the classes and properties and so giving enhanced clarity of expression in the vocabularies and metadata used in describing a domain. Ontologies are semantic frameworks, which are used to map concepts of a subject domain, by determining possible relationships between entities within a dataset, and are essential for reasoning and inference. Ontologies could be considered as a subclass of data modeling but, as Flanders & Jannidis (2015) explain, the key difference is that ontologies are restricted to conceptual level with entities organised in taxonomic hierarchies while are explicitly intended for enabling knowledge sharing and reuse and not as an internal part of a system.

 

Since ontologies are conceptual models do no require any specific technical implementation or programming skills, but they do need to be projected on the machines, while are usually applied with RDF, the language of the Semantic Web and by following the Linked Data approach. Semantic Web and Linked Data technologies are based on the use of shared aligned ontologies, metadata and vocabularies, enhancing significantly the capability of machines to exchange and represent interrelated data meaningfully and efficiently. Items that are described and enriched with mutually linked concepts and other terms as a stable reference, help resolving disambiguation issues caused by natural language, in documentation and information retrieval but they also enhance crucially semantic interoperability, knowledge discovery and sharing.

 

Text & Data Mining

Text and data mining technologies play a decisive role in analysing vast information or ‘big data’ automatically and, therefore, in recognising and interpreting patterns and, consequently, transforming knowledge, in a way that would be impossible with traditional scholarship.

 

XML, Semantic Web & Linked Data

XML & Web Technologies

The Extensible Markup Language (XML) (https://www.w3.org/TR/xml/) is one of the most widely-used encoding standard, which defines a set of rules for describing, representing and sharing structured information, such as texts, documents, data, configuration, books, transactions, among many other. Markup or encoding languages are used for annotating documents in order to syntactically distinguish them from the texts and make explicit the interpretations of texts. XML is an internationally adopted data representation format, simple text-based for both humans and machines. XML standards are part of the foundation of the Web and are used in a variety of applications because they are highly interoperable. Maintained by W3C, XML was derived from an older standard, the SGML (ISO 8879) format, but it was designed for ease of implementation and for interoperability with both SGML and HTML.

 

Compared to the HyperText Markup Language (HTML) (http://www.w3.org/TR/html4/), the publishing markup language of the Web, XML has significant advantages because it is extensible and not specified through predefined tags, it focuses on the meaning and structure of data and not on how data will be displayed, while very important is that XML documents follow syntax rules, through a defined syntax, and so they must be well-formed, prohibiting errors and ensuring reliability. In addition, validating the formal structure of XML documents against a schema, it is very important for quality assurance. XML has many advantages over many formats and therefore it is very widely used, while it forms the basis for other standards, such as TEI for text representation. Key features that make it a very powerful and highly interoperable technology include the focus on the descriptive markup, for making explicit text interpretations, the distinction between the concepts of syntactic correctness and the validity with respect to a document type definition and system independency. However, as described in Ciotti and Tomassi (2016-2017), XML is poor for semantic data modeling, as XML defines the syntactic aspects of a markup language and does not provide a computational semantics to markup or to data.

 

Linked Open Data & Semantic Web

The term ‘Semantic Web’ refers to the vision of W3C for a Web of linked data. The Semantic Web is a Web of Data, of connected dates and titles and part numbers and any other types of data. W3C Semantic Web standards and best practices (https://www.w3.org/standards/semanticweb/), assist in building a technology stack that supports a ‘Web of data’, the sort of data found in databases with the ultimate goal to enable computers to do more useful work by developing interoperable Web systems, which are built in a well-defined manner and support trusted interactions over the network. Semantic Web and Linked Data are revolutionary approaches for publishing and connecting distributed content across the Web and are empowered by technologies such as RDF, SPARQL, OWL, and SKOS. Semantic Geospatial Web refers to the exploitation of the Semantic Web as a platform for geospatial knowledge integration and retrieval, and therefore the use of geospatial ontologies, semantic annotation and interlinking with gazetteers are required.

 

Large amounts of digitised content and data have been created across different research centres and institutions, libraries, archives, museums and other cultural heritage organisations. Undoubtedly, the benefits are great but this caused a chaotic landscape resulting often in isolated digital ‘islands’ with hidden treasures and in difficulties for scholars to find, access and reuse content precious for their research. Scattered content complicates also for a holistic view of a subject and so prevent scholars from insightful interpretations and recognition of patterns that would lead to the creation and transformation of knowledge. Metadata aggregators and other information portals and repositories that collect, unify, enrich and publish through a single point of access digital resources, which are distributed across different organisations and systems, offer great solutions for knowledge discovery, access and reuse, but also in overcoming difficulties in data integration and interoperability. Anyhow, the nature and unique features of humanities data bring many challenges to machines for processing, representing and sharing harmonised data automatically, efficiently and meaningfully, while the World Wide Web has been proved insufficient for machines to understand and disseminate knowledge without losing the meaning and intellectual value of humanities data. Data in arts and humanities is of multilingual and interdisciplinary nature, with vague and ambiguous meanings, inconsistencies, complex relationships and structures that machines cannot represent and interrelate as perceived by the human mind and expressed in natural language. However, the major syntax and semantic interoperability difficulties that prevent linking and presenting harmonised data, are caused because data is scattered across different organisations and systems and so it is provided in a variety of formats, representing various types of content in different forms, since each community has its own established standards, follows different best practices, while is using different metadata schemas, vocabularies and data models in order to describe precisely and meaningfully the structure and intellectual values of the content, which is actually interpreted differently depending on the context and culture is related.

 

Semantic Web and Linked Open Data technologies offer revolutionary changes in publishing and connecting semantically heterogeneous distributed machine-readable content across the Web and therefore facilitating intelligent and meaningful cross-domain interoperability, knowledge representation and discovery. Thus, the Semantic Web has become the standard way for a universal and interoperable data representation and it could be seen as a new layer of metadata being build inside the Web, though used in the broader sense and for indicating processable and interpretable data that is readable by machines and not only by humans as with WWW. As Hyvonen (2012) indicates, the key idea of the Semantic Web is that machines can interpret and process the meanings of web content through syntactic metadata structures, based on shared semantic specifications founded on formal logic, enabling the creation of extra interoperable and intelligent Web services. According to Hyvonen (2012), the Semantic Web is based on the layer-cake model that includes several necessary levels of descriptions, specifically, the domain of the discourse that is the real world, the data and metadata levels, with the RDF data model that is the basis of the Semantic Web and Linked Data and is used for representing metadata and other forms of content on the Web of Data. Next layers on the model are the ontology level with the RDF Schema and the Web Ontology Language OWL, which are used for representing ontologies that describe vocabularies and concepts concerning the real world, and finally the metaontology level, the logic rules that can be used for delivering new facts and knowledge based on the metadata and ontologies, as regards general cross-domain modeling principles of ontologies that are domain-independent.

 

Resource Description Framework (RDF) (https://www.w3.org/RDF/) is a standard model for data interchange on the Web. RDF Schema (RDFS) provides a data-modelling vocabulary for RDF data. RDF Schema is an extension of the basic RDF vocabulary. RDF is the markup language of the Semantic Web and it provides the foundation for publishing and linking data. RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed. RDF extends the linking structure of the Web by using Universal Resource Identifiers (URIs) to describe resources, specifically, by naming the relationship between two things as well as the two ends of the link, known as a ‘triple’. By using this simple model based on triples, RDF significantly allows structured and semi-structured data to be mixed, exposed, and shared across different applications. This linking structure forms a directed, labeled graph, where the edges represent the named link between two resources, represented by the graph nodes. This graph view is the easiest possible mental model for RDF and is often used in easy-to-understand visual explanations.

 

SPARQL (https://www.w3.org/TR/sparql11-overview/) is the query language for the Web, allowing information encoded in RDF to be discovered and manipulated. SPARQL can be used to express queries across diverse data sources, it contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions, while it also supports extensible value testing and constraining queries by source RDF graph. The results of SPARQL queries can be results sets or RDF graphs. Query results come in JSON, XML, CSV and TSV. There is also GEOSPARQL (http://www.geosparql.org/) for querying linked geospatial data. The Web Ontology Language (OWL) (https://www.w3.org/OWL/) is an ontology language for the Semantic Web with formally defined meaning. OWL 2 ontologies provide classes, properties, individuals, and data values and are stored as Semantic Web documents, while they can be used along with information written in RDF, but they are primarily exchanged as RDF documents. The Simple Knowledge Organization System (SKOS) (https://www.w3.org/TR/skos-primer/) is a common data model for sharing and linking knowledge organization systems via the Semantic Web. As an application of RDF, SKOS allows concepts to be composed and published on the World Wide Web, linked with data on the Web and integrated into other concept schemes.

 

 

Source Code & Documentation