Free Open Data from

On this website we offer the data used on for download. It contains the following data sets:


We offer our own work under a Open Data Commons Attribution License (ODC-By) which means you are free to use, share, and adapt this data as you attribute the source by linking to this website. Our data sources ( harvesting, download) impose further restrictions, including, that you link back to for downloads.


The format of the data presented below is likely to change, as we revise our data structures in the future. All available data has been created using sources from the We do not garantee the validity or completeness of the data.

Arxiv citation graph

We downloaded the source files of all arxiv articles published until 2012-09-31, extracted the references and matched them against the metadata using these python scripts. The result is a 2.0Gb sized *.txt file with more than 16m lines representing the citaiton graph in the following format:
{source-id}|{reference string as found in tex sources}|{target-id if found}


1008.4729|M. Johnson, K. Zumbrun, and P. Noble, Nonlinear stability of viscous roll waves, preprint (2010).|1002.0788
1002.2065|K. Binder. J. Non-crystalline Solids , 307:1--8, 2002.|
astro-ph/0006446|D. Boyanovsky and H. J. de Vega, Phys. Rev. D61 , 105014 (2000).|
0711.3015|Coldea, R., Tennant, D. A. Tylczynski, Z. Extended scattering continua [...]. Phys. Rev. B / 68 , 134424 (2003).|cond-mat/0307025


Arxiv metadata

The arxiv offers the metadata for all articles for download using an Open Archives Interface API. We downloaded all available data until 2012-09-31 and stored using these scripts in the JSON format.


[  "0704.0204",
   {"publisher": [], 
    "description": ["  We present a theory of transport through interacting [.....] 
                       A $\\pi$-transition of the supercurrent can\nbe driven by 
                       tuning gate or bias voltages.\n", 
                    "Comment: 11 pages, 4 figures"],
    "language": [], 
    "rights": [], 
    "format": [], 
    "contributor": [], 
    "source": [], 
    "creator": ["Pala, Marco G.", "Governale, Michele", "K\u00f6nig, J\u00fcrgen"], 
    "relation": [], 
    "coverage": [], 
    "date": ["2007-04-02", "2007-08-29"], 
    "title": ["Non-Equilibrium Josephson and Andreev Current through Interacting\n  Quantum Dots"], 
    "identifier": ["", "New J. Phys. 9 (2007) 278", "doi:10.1088/1367-2630/9/8/278"], 
    "type": ["text"], 
    "subject": ["Condensed Matter - Superconductivity", "Condensed Matter - Mesoscale and Nanoscale Physics"]}


Arxiv citation and author graph as Neo4J database

The above information is filled into a neo4j (v 1.7.2) graph database using these python script. The basic structure of the graph db is as follows:
  • We have nodes for every paper storing basic metadata.
  • We have reference relations between papers.
  • We have nodes for every mentioned author name and an author relation for each of his papers.
Here is my attempt of visualizing the situation.
    Paper1    Paper2
      |                    |
      |[author]            |[author]
      v                    v
    Author1              Author2
For more information, please refer to the documentation.


One thought on “Data

  1. Pingback: Get the full neo4j power by using the Core Java API for traversing your Graph data base instead of Cypher Query Language

Leave a Reply

Your email address will not be published. Required fields are marked *



You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>