Last active
April 16, 2016 14:10
-
-
Save nrouyer/349374ddccfd1973dd38 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
= Open Food Facts | |
:neo4j-version: 2.3.2 | |
:author: Nicolas Rouyer | |
:toc: right | |
:twitter: @rrrouyer | |
:description: Open Food Facts | |
:tags: domain:open data, use-case:open food facts | |
This interactive Neo4j graph tutorial shows how to handle open food facts data... for the best of your health ! | |
''' | |
:toc: left | |
''' | |
[[introduction]] | |
== Open food facts | |
image::http://static.openfoodfacts.org/images/misc/openfoodfacts-logo-en-178x150.png[Open Food Facts] | |
Open food facts is the free food product database ! | |
It gathers information and data on food products from around the world. | |
This database is completed thanks to individual, international contributors who scan product barcodes and upload pictures of their label. | |
http://fr.openfoodfacts.org/ | |
[[graph_creation]] | |
=== Creating open food facts graph | |
[source,cypher] | |
---- | |
// OPEN FOOD FACTS - CREATE INDEX ON PRODUCT CODE | |
CREATE INDEX ON :Product(code); | |
// OPEN FOOD FACTS - CREATE INDEX ON INGREDIENT FOOD | |
CREATE INDEX ON :Ingredient(food); | |
// OPEN FOOD FACTS - LOAD PRODUCT NODES | |
LOAD CSV WITH HEADERS FROM "https://gist.githubusercontent.com/nrouyer/fdcea6bbb5ea8e3377fb2ec3139b0c17/raw/f93de61779cccd74e9eb94566a6efc3358b00db1/off_products_163.csv" AS csvLine | |
FIELDTERMINATOR ";" | |
CREATE (p:Product { code: csvLine.code, | |
name: coalesce(csvLine.name,"NA"), | |
sodiumPer100g: coalesce(csvLine.sodiumPer100g,"NA"), | |
fatPer100g: coalesce(csvLine.fatPer100g,"NA"), | |
proteinsPer100g: coalesce(csvLine.proteinsPer100g,"NA"), | |
nutritionScoreFrPer100g: coalesce(csvLine.nutritionScoreFrPer100g,"NA"), | |
energyPer100g: coalesce(csvLine.energyPer100g,"NA"), | |
fiberPer100g: coalesce(csvLine.fiberPer100g,"NA"), | |
sugarsPer100g: coalesce(csvLine.sugarsPer100g,"NA"), | |
saltPer100g: coalesce(csvLine.saltPer100g,"NA"), | |
nutritionScoreUkPer100g: coalesce(csvLine.nutritionScoreUkPer100g,"NA") | |
}); | |
// LOAD INGREDIENTS | |
LOAD CSV WITH HEADERS FROM "https://gist.githubusercontent.com/nrouyer/40f6b8d87f7f239f5a0f62e7756f8879/raw/1cc542d70a1bc1829d2643eb02d046f733545bb8/off_ingredients_163.csv" AS csvLine | |
FIELDTERMINATOR ';' | |
MERGE (i:Ingredient { food: csvLine.Ingredient }); | |
// LOAD COMPOSITION RELATIONSHIPS | |
LOAD CSV WITH HEADERS FROM "https://gist.githubusercontent.com/nrouyer/8cc54359a569d5df445f8fa1066f2daa/raw/ecc608e6b59c971db448a4fd59c62e14c21dd0cc/off_composition_163.csv" AS csvLine | |
FIELDTERMINATOR ';' | |
MATCH (p:Product { code: csvLine.code }) | |
MATCH (i:Ingredient { food: csvLine.food }) | |
MERGE (p)-[:CONTAINS { rank: coalesce(csvLine.rank,"NA") }]->(i); | |
---- | |
Graph data loaded ! | |
''' | |
[[graph_consultation]] | |
=== Sodas' ingredients war : Pepsi vs 7Up | |
As a warm up, let us compare Pepsi and 7Up composition (whose tastes are radically different...) | |
[source,cypher] | |
---- | |
// OPEN FOOD FACTS - GET 7UP INGREDIENTS SHORT NAME | |
MATCH (p:Product {name:'7Up'})-[:CONTAINS]->(i:Ingredient) | |
WITH i, SPLIT(i.food, '/') AS Ingredients | |
RETURN Ingredients[4] AS Ingredient | |
---- | |
[source,cypher] | |
---- | |
// OPEN FOOD FACTS - GET PEPSI INGREDIENTS SHORT NAME | |
MATCH (p:Product {name:'Pepsi, Nouveau goût !'})-[:CONTAINS]->(i:Ingredient) | |
WITH i, SPLIT(i.food, '/') AS Ingredients | |
RETURN Ingredients[4] AS Ingredient | |
---- | |
[source,cypher] | |
---- | |
// OPEN FOOD FACTS - GET INGREDIENTS COMMON TO PEPSI & 7UP | |
MATCH (p1:Product {name:'7Up'})-[:CONTAINS]->(i:Ingredient) | |
MATCH (p2:Product {name:'Pepsi, Nouveau goût !'})-[:CONTAINS]->(i) | |
RETURN i.food AS Ingredient | |
---- | |
''' | |
[[graph_food_neighbours]] | |
=== My neighbourfood | |
With Cypher we can easily query the food data model and find closest enighbours to any given product (that is, the products that have the most common ingredients) | |
[source,cypher] | |
---- | |
// OPEN FOOD FACTS - CLOSEST NEIGHBOURS (2) | |
MATCH (p1:Product {name: 'Chair à saucisse'} )-[c1:CONTAINS]->(i:Ingredient)<-[c2:CONTAINS]-(p2:Product) | |
RETURN p2.name AS Neighbour, collect(i.food) AS Ingredients_In_Common, count(i.food) AS STRENGTH | |
ORDER BY STRENGTH DESC | |
---- | |
[[graph_refactoring]] | |
=== Refactoring OFF graph | |
Let us simply perform a cosmetic customization on our Open Food Facts graph : | |
[source,cypher] | |
---- | |
MATCH (i:Ingredient) | |
WITH i, SPLIT(i.food, '/') AS Ingredients | |
SET i.shortname = Ingredients[4] | |
---- | |
Then we query the closest neighbours again, with a better formatted result. | |
[source,cypher] | |
---- | |
MATCH (p1:Product {name: 'Chair à saucisse'} )-[c1:CONTAINS]->(i:Ingredient)<-[c2:CONTAINS]-(p2:Product) | |
RETURN p2.name AS Neighbour, collect(i.shortname) AS Ingredients_In_Common, count(i.food) AS STRENGTH | |
ORDER BY STRENGTH DESC | |
---- | |
[[shortest_food_path]] | |
=== Find shortest path between products | |
Hey, let us randomly select 2 food products. Can we discover anything with the shortest path between them ? | |
[source,cypher] | |
---- | |
// OPEN FOOD FACTS - SHORTEST PATH | |
MATCH (rollmops:Product {name:"Rollmop Herrings"}), | |
(macncheese:Product {code:"00036559"}), | |
p =(rollmops)-[:CONTAINS*1..6]-(macncheese) | |
WHERE ANY(x IN NODES(p) WHERE x:Ingredient) | |
WITH p ORDER BY LENGTH(p) LIMIT 1 | |
RETURN p | |
---- | |
''' | |
[[conclusion]] | |
=== Let's feed the food graph... | |
This great, open, database helps find insights on our day-to-day essential. It was made for more transparency and to share universal knowledge. + | |
image::http://static.openfoodfacts.org/images/svg/crowdsourcing-icon.svg[Yes we scan !!!] | |
There are excellent works performed with the whole database on [Kaggle](https://www.kaggle.com/ "The Home of Data Science"). + | |
Please enjoy and post your remarks: + | |
mailto:[email protected]>[Nicolas ROUYER] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment