Skip to content

Instantly share code, notes, and snippets.

View chenhong805's full-sized avatar
๐Ÿ˜ƒ

Xx chenhong805

๐Ÿ˜ƒ
View GitHub Profile
@aseigneurin
aseigneurin / Spark parquet.md
Created November 15, 2016 15:25
Spark - Parquet files

Spark - Parquet files

Basic file formats - such as CSV, JSON or other text formats - can be useful when exchanging data between applications. When it comes to storing intermediate data between steps of an application, Parquet can provide more advanced capabilities:

  • Support for complex types, as opposed to string-based types (CSV) or a limited type system (JSON only supports strings, basic numbers, booleans).
  • Columnar storage - more efficient when not all the columns are used or when filtering the data.
  • Partitioning - files are partitioned out of the box
  • Compression - pages can be compressed with Snappy or Gzip (this preserves the partitioning)

The tests here are performed with Spark 2.0.1 on a cluster with 3 workers (c4.4xlarge, 16 vCPU and 30 GB each).

@pachanka
pachanka / search-with-gpg.sh
Last active May 7, 2023 17:57
A simple bash script to grep within a bunch of GPG encrypted files.
#!/bin/bash
#
# Usage search-with-gpg path/to/encrypted/files/*
#
if [ -z "$1" ]; then
echo "Usage: $0 'search string' [path/to/encrypted/files/*]";
exit 1;
else
SEARCH=$1;
@brennanMKE
brennanMKE / hero.ts
Last active April 6, 2025 07:33
Example of Mongoose with TypeScript and MongoDb
import * as mongoose from 'mongoose';
export let Schema = mongoose.Schema;
export let ObjectId = mongoose.Schema.Types.ObjectId;
export let Mixed = mongoose.Schema.Types.Mixed;
export interface IHeroModel extends mongoose.Document {
name: string;
power: string;
@stephenturner
stephenturner / install-gcc48-linuxbrew-centos6.md
Last active January 8, 2025 06:27
Installing gcc 4.8 and Linuxbrew on CentOS 6

Installing gcc 4.8 and Linuxbrew on CentOS 6

The GCC distributed with CentOS 6 is 4.4.7, which is pretty outdated. I'd like to use gcc 4.8+. Also, when trying to install Linuxbrew you run into a dependency loop where Homebrew's gcc depends on zlib, which depends on gcc. Here's how I solved the problem.

Note: Requires sudo privileges.

Resources:

@oza
oza / SparkOnYARN.md
Last active October 9, 2022 08:53
How to run Spark on YARN with dynamic resource allocation

YARN

  1. General resource management layer on HDFS
  2. A part of Hadoop

Spark

  1. In memory processing framework

Spark on YARN

@digital-wonderland
digital-wonderland / README.md
Last active April 6, 2022 17:22
Unit files to deploy an ElasticSearch cluster on CoreOS via Fleet

What

Unit files to deploy an ElasticSearch cluster on CoreOS via Fleet.

Service discovery & registration is done via etcd.

[email protected] provides a dumb discovery service by registering an elasticsearch host if it should be up. [email protected] registers the service only if it is running.

A service & timer unit for elasticsearch curator is provided which does some housekeeping.

@gigorok
gigorok / gist:5ca39384635113495796
Created July 4, 2014 13:59
php interactive shell with loaded composer dependencies
php -a -d auto_prepend_file=./vendor/autoload.php
@chaitanyagupta
chaitanyagupta / _reader-macros.md
Last active July 4, 2025 22:26
Reader Macros in Common Lisp

Reader Macros in Common Lisp

This post also appears on lisper.in.

Reader macros are perhaps not as famous as ordinary macros. While macros are a great way to create your own DSL, reader macros provide even greater flexibility by allowing you to create entirely new syntax on top of Lisp.

Paul Graham explains them very well in [On Lisp][] (Chapter 17, Read-Macros):

The three big moments in a Lisp expression's life are read-time, compile-time, and runtime. Functions are in control at runtime. Macros give us a chance to perform transformations on programs at compile-time. ...read-macros... do their work at read-time.

@Stanback
Stanback / nginx.conf
Last active March 30, 2025 03:57 — forked from michiel/cors-nginx.conf
Example Nginx configuration for adding cross-origin resource sharing (CORS) support to reverse proxied APIs
#
# CORS header support
#
# One way to use this is by placing it into a file called "cors_support"
# under your Nginx configuration directory and placing the following
# statement inside your **location** block(s):
#
# include cors_support;
#
# As of Nginx 1.7.5, add_header supports an "always" parameter which
@rossant
rossant / raytracing.py
Last active August 15, 2025 21:37
Very simple ray tracing engine in (almost) pure Python. Depends on NumPy and Matplotlib. Diffuse and specular lighting, simple shadows, reflections, no refraction. Purely sequential algorithm, slow execution.
"""
MIT License
Copyright (c) 2017 Cyrille Rossant
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is