Skip to content

Instantly share code, notes, and snippets.

import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.sql.SaveMode._
val sc = spark.sparkContext
val ssc = new StreamingContext(sc, Seconds(1))
val inputPath = "/tmp/inputDir/"
@Mageswaran1989
Mageswaran1989 / spark_dataset.ipynb
Created January 30, 2020 03:24
A gentele introduction to Spark Datasets
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@davideicardi
davideicardi / README.md
Last active May 4, 2025 21:48
Write and read Avro records from bytes array

Avro serialization

There are 4 possible serialization format when using avro:

@CesarCapillas
CesarCapillas / add-by-id.sh
Last active April 23, 2024 21:14
SOLR bash recipes for creating, deleting or truncating collections, monitoring and searching.
#!/bin/bash
COLLECTION=${2:-zylk}
SERVER=${3:-localhost}
PORT=${4:-8983}
if [ -z "$1" ]; then
# Usage
echo 'Usage: add-by-id.sh <id> [<collection> <solr-server=localhost> <port=8383>]'
else
curl -X POST "http://${SERVER}:${PORT}/solr/${COLLECTION}/update?commit=true" -H "Content-Type: text/xml" --data-binary "<add><doc><field name='id'>$1</field><field name='url'>$1</field></doc></add>"
@max-mapper
max-mapper / bibtex.png
Last active November 6, 2024 09:03
How to make a scientific looking PDF from markdown (with bibliography)
bibtex.png
@allquest
allquest / leboncoin_rss.user.js
Last active August 2, 2023 11:00
Greasemonkey script for LeBonCoin - A kind of RSS for the website Le bon coin with your query -- each time you reload a page, a GET request is sent to lbc and match your query. If a new offer is available, the link is shown on the top of the page.
// ==UserScript==
// @name Leboncoin RSS
// @namespace http://gist.github.com/fb7b790fb6548bdec3ec5259bebd20c0
// @author Tegomass
// @description A kind of RSS for LeBonCoin with your personnal search
// @include *
// @require https://cdnjs.cloudflare.com/ajax/libs/jquery/3.1.1/jquery.min.js
// @version 1.1
// @grant GM_addStyle
// @grant GM_setValue
@yoyama
yoyama / Schema2CaseClass.scala
Created January 20, 2017 07:36
Generate case class from spark DataFrame/Dataset schema.
/**
* Generate Case class from DataFrame.schema
*
* val df:DataFrame = ...
*
* val s2cc = new Schema2CaseClass
* import s2cc.implicit._
*
* println(s2cc.schemaToCaseClass(df.schema, "MyClass"))
*
@longcao
longcao / SparkCopyPostgres.scala
Last active September 11, 2024 18:55
COPY Spark DataFrame rows to PostgreSQL (via JDBC)
import java.io.InputStream
import org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils
import org.apache.spark.sql.{ DataFrame, Row }
import org.postgresql.copy.CopyManager
import org.postgresql.core.BaseConnection
val jdbcUrl = s"jdbc:postgresql://..." // db credentials elided
val connectionProperties = {
@Alanaktion
Alanaktion / pacman.md
Last active April 21, 2020 14:49
Useful pacman commands and packages

Basic usage

pacman -S <package> # Install a package
pacman -Sy # Update package list
pacman -Su # Update installed packages
pacman -Ss <query> # Search packages
pacman -R <package> # Remove a package
pacman -Rs <package> # Remove a package and it's unneeded dependencies
@paulp
paulp / oddity.txt
Created January 11, 2016 22:22
Whitespace Oddity
WHITESPACE ODDITY
by Paul Phillips, in eternal admiration of David Bowie, RIP
Bound Ctrl to Major mode
Bound Ctrl to Major mode
Read inputrc and set extdebug on
Bound Ctrl to Major mode (Ten, Nine, Eight, Seven, Six)
Connecting readline, options on (Five, Four, Three)
Check the syntax, may terminfo be with you (Two, One, Exec)