The Fifth Elephant 2020 edition

The Fifth Elephant 2020 edition

On data governance, engineering for data privacy and data science

The ninth edition of The Fifth Elephant will be held in Bangalore on 16 and 17 July 2020.

The Fifth Elephant brings together over one thousand data scientists, ML engineers, data engineers and analysts to discuss:

  1. Data governance
  2. Data privacy and engineering for privacy including engineering for Personal Data Protection (PDP) bill.
  3. Data cleaning, annotation, instrumentation and productionizing data science.
  4. Identifying and handling fraud + data security at scale
  5. Feature engineering and ML platforms.
  6. What it takes to create data-driven cultures in organizations of different scales.

**Event details:

Dates: 16-17 July 2020
Venue: NIMHANS Convention Centre, Dairy Circle, Bangalore

Why you should attend:

  1. Network with peers and practitioners from the data ecosystem.
  2. Share approaches to solving expensive problems such as cleanliness of training data, annotation, model management and versioning data.
  3. Demo your ideas in the demo sessions.
  4. Join Birds of Feather (BOF) sessions to have productive discussions on focussed topics. Or, start your own Birds of Feather (BOF) session.

Contact details:
For more information about The Fifth Elephant, call +91-7676332020 or email sales@hasgeek.com


Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Ashok Hariharan

@bungeni_ashok

The Akoma Ntoso document standard for legislative and judicial documents

Submitted Mar 4, 2020

This presentation talks about using open standards - specifically the Akoma ntoso document format to represent parliamentary and judicial documents. What is the purpose of this standard, and why are legislatures and legal bodies adopting it?

Outline

Presentation Summary - The Akoma Ntoso Open Document Standard for legal and legislative documents

Objective of the talk

  • how open standards enable citizens access to parliamentary & judicial records and documents

My background

  • one slide about who I am and my background

The technology challenge with legislation

  • Legislation once passed typically has no expiry date
  • Oldest active law is the statute of marlbourough (1267)
  • Sharia 710- present
  • Magna carta, from 1215-1863
  • French civil code 1804- present
  • Tools
  • evolving technologies change rapidly,
  • are the first to become obsolete
  • Architectures
  • cover processes, actors and roles (who is doing what) defined by policy and the tools that support them.
  • Generally stable, but affected by changes in scope, users, requirements AND technology
  • Data formats -
  • code of hammurabi - 2500 bc on stone
  • magna carta - 1215 on parchment
  • indian consitution - 1947 - hand written parchment
  • various digital formats

The law once passed will easily outlive any software system built to manage it and serve the legislation

Data Formats

  • describe the kind of information that is exchanged by tools, typically the richer the data wider the reach
  • Popular formats :
  • PDF - not semantically aware, primarily for printing
  • HTML - primarily for web presentation, limited structure and semantics for other domains
  • RDF/A provides semantics withn other documents, no specific connection between markup and semantically
  • Linked Data good for data not documents
  • JSON for data not documents
  • XML for semantic documents <== Akoma Ntoso XML standard
    (brief slide on why XML is a suitable format for legislation as opposed to others)

Data Formats Entropy

  • Entropy is information lost when moving between formats
  • e.g. Physical Paper is less expressive than a scanned OCRed PDf of the paper
  • HTML is more expressive than PDF
  • semantic XML is more expressive than HTML
  • easier to go from more expressive => less expressive
  • harder to go from less => more

(diagram of entropy triangle )

Akoma Ntoso as a low entropy format

  • XML vocabulary and naming convention
  • conceptual approach to management of legislative information :
  • Who is the author
  • separation of conent, semantics and presentation
  • distniction between documents and files
  • long term preservation characteristics
  • Establishes use independent of tools

Design issue: short and long term concerns

  • Short term

  • convert to different formats for web, print, mobile use etc.

  • support document workflow

  • point-in-time consolidation

  • semantic classification etc

  • Long term

  • docs need to be readable 5, 50 or more years from now

  • self containment and self descriptive

  • not dependent on software architecture

Design Issues: Authorship

  • Data: actual text as exactly by the author
  • Metadata: objective (dates, author, original or amendment) and subjective (keywords, categorizing parts of content)

Design Issues: Different aspects of a Document

  • Content
  • Structure
  • Semantics
  • Presentation

(slides about each of these aspects with real examples of legislation / judgments)

Technical Architectures
Discuss different technical architectures - what works, what doesnt

Adoption

  • information about adoption
  • who is adopting it, and why are they adopting it

Requirements

Some basic technical knowledge, some basic legal knowledge is useful in terms of knowing different kinds of legal documents, but not mandatory

Speaker bio

Working in the Legal / legislative informatics space for 15 years. Working with the Akoma Ntoso document standard since its creation. Implemented several projects that make use of it.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more