nrouyer · April 16, 2016 14:10
diff --git a/open_food_facts.txt b/open_food_facts.txt
 = Open Food Facts
 :neo4j-version: 2.3.2
 :author: Nicolas Rouyer
 :toc: right
 :twitter: @rrrouyer
 :description: Open Food Facts
 :tags: domain:open data, use-case:open food facts

 This interactive Neo4j graph tutorial shows how to handle open food facts data... for the best of your health !

 '''

 :toc: left

 '''

 [[introduction]]
 == Open food facts 

 image::http://static.openfoodfacts.org/images/misc/openfoodfacts-logo-en-178x150.png[Open Food Facts]

 Open food facts is the free food product database !
 It gathers information and data on food products from around the world.

 This database is completed thanks to individual, international contributors who scan product barcodes and upload pictures of their label.

 http://fr.openfoodfacts.org/

 [[graph_creation]]
 === Creating open food facts graph
 [source,cypher]
 ----
 // OPEN FOOD FACTS - CREATE INDEX ON PRODUCT CODE
 CREATE INDEX ON :Product(code);
 // OPEN FOOD FACTS - CREATE INDEX ON INGREDIENT FOOD
 CREATE INDEX ON :Ingredient(food);

 // OPEN FOOD FACTS - LOAD PRODUCT NODES
 LOAD CSV WITH HEADERS FROM "https://gist.githubusercontent.com/nrouyer/fdcea6bbb5ea8e3377fb2ec3139b0c17/raw/f93de61779cccd74e9eb94566a6efc3358b00db1/off_products_163.csv" AS csvLine
 FIELDTERMINATOR ";" 
 CREATE (p:Product { 	code: csvLine.code, 
 					name: coalesce(csvLine.name,"NA"),
 					sodiumPer100g: coalesce(csvLine.sodiumPer100g,"NA"),
 					fatPer100g: coalesce(csvLine.fatPer100g,"NA"),
 					proteinsPer100g: coalesce(csvLine.proteinsPer100g,"NA"),
 					nutritionScoreFrPer100g: coalesce(csvLine.nutritionScoreFrPer100g,"NA"),
 					energyPer100g: coalesce(csvLine.energyPer100g,"NA"),
 					fiberPer100g: coalesce(csvLine.fiberPer100g,"NA"),
 					sugarsPer100g: coalesce(csvLine.sugarsPer100g,"NA"),
 					saltPer100g: coalesce(csvLine.saltPer100g,"NA"),
 					nutritionScoreUkPer100g: coalesce(csvLine.nutritionScoreUkPer100g,"NA")
 });

 // LOAD INGREDIENTS
 LOAD CSV WITH HEADERS FROM "https://gist.githubusercontent.com/nrouyer/40f6b8d87f7f239f5a0f62e7756f8879/raw/1cc542d70a1bc1829d2643eb02d046f733545bb8/off_ingredients_163.csv" AS csvLine 
 FIELDTERMINATOR ';' 
 MERGE (i:Ingredient { food: csvLine.Ingredient });

 // LOAD COMPOSITION RELATIONSHIPS
 LOAD CSV WITH HEADERS FROM "https://gist.githubusercontent.com/nrouyer/8cc54359a569d5df445f8fa1066f2daa/raw/ecc608e6b59c971db448a4fd59c62e14c21dd0cc/off_composition_163.csv" AS csvLine
 FIELDTERMINATOR ';' 
 MATCH (p:Product { code: csvLine.code })
 MATCH (i:Ingredient { food: csvLine.food })
 MERGE (p)-[:CONTAINS { rank: coalesce(csvLine.rank,"NA") }]->(i);

 ----
 Graph data loaded !

 '''

 [[graph_consultation]]
 === Sodas' ingredients war : Pepsi vs 7Up

 As a warm up, let us compare Pepsi and 7Up composition (whose tastes are radically different...)

 [source,cypher]
 ----
 // OPEN FOOD FACTS - GET 7UP INGREDIENTS SHORT NAME
 MATCH (p:Product {name:'7Up'})-[:CONTAINS]->(i:Ingredient)
 WITH i, SPLIT(i.food, '/') AS Ingredients 
 RETURN Ingredients[4] AS Ingredient
 ----

 [source,cypher]
 ----
 // OPEN FOOD FACTS - GET PEPSI INGREDIENTS SHORT NAME
 MATCH (p:Product {name:'Pepsi, Nouveau goût !'})-[:CONTAINS]->(i:Ingredient)
 WITH i, SPLIT(i.food, '/') AS Ingredients 
 RETURN Ingredients[4] AS Ingredient
 ----

 [source,cypher]
 ----
 // OPEN FOOD FACTS - GET INGREDIENTS COMMON TO PEPSI & 7UP
 MATCH (p1:Product {name:'7Up'})-[:CONTAINS]->(i:Ingredient)
 MATCH (p2:Product {name:'Pepsi, Nouveau goût !'})-[:CONTAINS]->(i)
 RETURN i.food AS Ingredient
 ----

 ''' 

 [[graph_food_neighbours]]
 === My neighbourfood

 With Cypher we can easily query the food data model and find closest enighbours to any given product (that is, the products that have the most common ingredients)

 [source,cypher]
 ----
 // OPEN FOOD FACTS - CLOSEST NEIGHBOURS (2)
 MATCH (p1:Product {name: 'Chair à saucisse'} )-[c1:CONTAINS]->(i:Ingredient)<-[c2:CONTAINS]-(p2:Product) 
 RETURN p2.name AS Neighbour, collect(i.food) AS Ingredients_In_Common, count(i.food) AS STRENGTH
 ORDER BY STRENGTH DESC
 ----

 [[graph_refactoring]]
 === Refactoring OFF graph

 Let us simply perform a cosmetic customization on our Open Food Facts graph : 

 [source,cypher]
 ----
 MATCH (i:Ingredient)
 WITH i, SPLIT(i.food, '/') AS Ingredients
 SET i.shortname = Ingredients[4]
 ----

 Then we query the closest neighbours again, with a better formatted result.

 [source,cypher]
 ----
 MATCH (p1:Product {name: 'Chair à saucisse'} )-[c1:CONTAINS]->(i:Ingredient)<-[c2:CONTAINS]-(p2:Product) 
 RETURN p2.name AS Neighbour, collect(i.shortname) AS Ingredients_In_Common, count(i.food) AS STRENGTH
 ORDER BY STRENGTH DESC
 ----

 [[shortest_food_path]]
 === Find shortest path between products

 Hey, let us randomly select 2 food products. Can we discover anything with the shortest path between them ?

 [source,cypher]
 ----
 // OPEN FOOD FACTS - SHORTEST PATH
 MATCH (rollmops:Product {name:"Rollmop Herrings"}),
      (macncheese:Product {code:"00036559"}), 
      p =(rollmops)-[:CONTAINS*1..6]-(macncheese)
 WHERE ANY(x IN NODES(p) WHERE x:Ingredient) 
 WITH p ORDER BY LENGTH(p) LIMIT 1
 RETURN p
 ----

 ''' 

 [[conclusion]]
 === Let's feed the food graph...
 This great, open, database helps find insights on our day-to-day essential. It was made for more transparency and to share universal knowledge. + 

 image::http://static.openfoodfacts.org/images/svg/crowdsourcing-icon.svg[Yes we scan !!!]

 There are excellent works performed with the whole database on [Kaggle](https://www.kaggle.com/ "The Home of Data Science"). +
 Please enjoy and post your remarks: + 
 mailto:[email protected]>[Nicolas ROUYER]
	= Open Food Facts
	:neo4j-version: 2.3.2
	:author: Nicolas Rouyer
	:toc: right
	:twitter: @rrrouyer
	:description: Open Food Facts
	:tags: domain:open data, use-case:open food facts

	This interactive Neo4j graph tutorial shows how to handle open food facts data... for the best of your health !

	'''

	:toc: left

	'''

	[[introduction]]
	== Open food facts

	image::http://static.openfoodfacts.org/images/misc/openfoodfacts-logo-en-178x150.png[Open Food Facts]

	Open food facts is the free food product database !
	It gathers information and data on food products from around the world.

	This database is completed thanks to individual, international contributors who scan product barcodes and upload pictures of their label.

	http://fr.openfoodfacts.org/

	[[graph_creation]]
	=== Creating open food facts graph
	[source,cypher]
	----
	// OPEN FOOD FACTS - CREATE INDEX ON PRODUCT CODE
	CREATE INDEX ON :Product(code);
	// OPEN FOOD FACTS - CREATE INDEX ON INGREDIENT FOOD
	CREATE INDEX ON :Ingredient(food);

	// OPEN FOOD FACTS - LOAD PRODUCT NODES
	LOAD CSV WITH HEADERS FROM "https://gist.githubusercontent.com/nrouyer/fdcea6bbb5ea8e3377fb2ec3139b0c17/raw/f93de61779cccd74e9eb94566a6efc3358b00db1/off_products_163.csv" AS csvLine
	FIELDTERMINATOR ";"
	CREATE (p:Product { code: csvLine.code,
	name: coalesce(csvLine.name,"NA"),
	sodiumPer100g: coalesce(csvLine.sodiumPer100g,"NA"),
	fatPer100g: coalesce(csvLine.fatPer100g,"NA"),
	proteinsPer100g: coalesce(csvLine.proteinsPer100g,"NA"),
	nutritionScoreFrPer100g: coalesce(csvLine.nutritionScoreFrPer100g,"NA"),
	energyPer100g: coalesce(csvLine.energyPer100g,"NA"),
	fiberPer100g: coalesce(csvLine.fiberPer100g,"NA"),
	sugarsPer100g: coalesce(csvLine.sugarsPer100g,"NA"),
	saltPer100g: coalesce(csvLine.saltPer100g,"NA"),
	nutritionScoreUkPer100g: coalesce(csvLine.nutritionScoreUkPer100g,"NA")
	});

	// LOAD INGREDIENTS
	LOAD CSV WITH HEADERS FROM "https://gist.githubusercontent.com/nrouyer/40f6b8d87f7f239f5a0f62e7756f8879/raw/1cc542d70a1bc1829d2643eb02d046f733545bb8/off_ingredients_163.csv" AS csvLine
	FIELDTERMINATOR ';'
	MERGE (i:Ingredient { food: csvLine.Ingredient });

	// LOAD COMPOSITION RELATIONSHIPS
	LOAD CSV WITH HEADERS FROM "https://gist.githubusercontent.com/nrouyer/8cc54359a569d5df445f8fa1066f2daa/raw/ecc608e6b59c971db448a4fd59c62e14c21dd0cc/off_composition_163.csv" AS csvLine
	FIELDTERMINATOR ';'
	MATCH (p:Product { code: csvLine.code })
	MATCH (i:Ingredient { food: csvLine.food })
	MERGE (p)-[:CONTAINS { rank: coalesce(csvLine.rank,"NA") }]->(i);

	----
	Graph data loaded !

	'''

	[[graph_consultation]]
	=== Sodas' ingredients war : Pepsi vs 7Up

	As a warm up, let us compare Pepsi and 7Up composition (whose tastes are radically different...)

	[source,cypher]
	----
	// OPEN FOOD FACTS - GET 7UP INGREDIENTS SHORT NAME
	MATCH (p:Product {name:'7Up'})-[:CONTAINS]->(i:Ingredient)
	WITH i, SPLIT(i.food, '/') AS Ingredients
	RETURN Ingredients[4] AS Ingredient
	----

	[source,cypher]
	----
	// OPEN FOOD FACTS - GET PEPSI INGREDIENTS SHORT NAME
	MATCH (p:Product {name:'Pepsi, Nouveau goût !'})-[:CONTAINS]->(i:Ingredient)
	WITH i, SPLIT(i.food, '/') AS Ingredients
	RETURN Ingredients[4] AS Ingredient
	----

	[source,cypher]
	----
	// OPEN FOOD FACTS - GET INGREDIENTS COMMON TO PEPSI & 7UP
	MATCH (p1:Product {name:'7Up'})-[:CONTAINS]->(i:Ingredient)
	MATCH (p2:Product {name:'Pepsi, Nouveau goût !'})-[:CONTAINS]->(i)
	RETURN i.food AS Ingredient
	----

	'''

	[[graph_food_neighbours]]
	=== My neighbourfood

	With Cypher we can easily query the food data model and find closest enighbours to any given product (that is, the products that have the most common ingredients)

	[source,cypher]
	----
	// OPEN FOOD FACTS - CLOSEST NEIGHBOURS (2)
	MATCH (p1:Product {name: 'Chair à saucisse'} )-[c1:CONTAINS]->(i:Ingredient)<-[c2:CONTAINS]-(p2:Product)
	RETURN p2.name AS Neighbour, collect(i.food) AS Ingredients_In_Common, count(i.food) AS STRENGTH
	ORDER BY STRENGTH DESC
	----

	[[graph_refactoring]]
	=== Refactoring OFF graph

	Let us simply perform a cosmetic customization on our Open Food Facts graph :

	[source,cypher]
	----
	MATCH (i:Ingredient)
	WITH i, SPLIT(i.food, '/') AS Ingredients
	SET i.shortname = Ingredients[4]
	----

	Then we query the closest neighbours again, with a better formatted result.

	[source,cypher]
	----
	MATCH (p1:Product {name: 'Chair à saucisse'} )-[c1:CONTAINS]->(i:Ingredient)<-[c2:CONTAINS]-(p2:Product)
	RETURN p2.name AS Neighbour, collect(i.shortname) AS Ingredients_In_Common, count(i.food) AS STRENGTH
	ORDER BY STRENGTH DESC
	----

	[[shortest_food_path]]
	=== Find shortest path between products

	Hey, let us randomly select 2 food products. Can we discover anything with the shortest path between them ?

	[source,cypher]
	----
	// OPEN FOOD FACTS - SHORTEST PATH
	MATCH (rollmops:Product {name:"Rollmop Herrings"}),
	(macncheese:Product {code:"00036559"}),
	p =(rollmops)-[:CONTAINS*1..6]-(macncheese)
	WHERE ANY(x IN NODES(p) WHERE x:Ingredient)
	WITH p ORDER BY LENGTH(p) LIMIT 1
	RETURN p
	----

	'''

	[[conclusion]]
	=== Let's feed the food graph...
	This great, open, database helps find insights on our day-to-day essential. It was made for more transparency and to share universal knowledge. +

	image::http://static.openfoodfacts.org/images/svg/crowdsourcing-icon.svg[Yes we scan !!!]

	There are excellent works performed with the whole database on [Kaggle](https://www.kaggle.com/ "The Home of Data Science"). +
	Please enjoy and post your remarks: +
	mailto:[email protected]>[Nicolas ROUYER]