Skip to content

Instantly share code, notes, and snippets.

@dpordomingo
Last active March 12, 2019 14:13
Show Gist options
  • Save dpordomingo/a72bd9054999f658e1b421f31cb947da to your computer and use it in GitHub Desktop.
Save dpordomingo/a72bd9054999f658e1b421f31cb947da to your computer and use it in GitHub Desktop.
Find slow repos

ussage:

./analyze.sh SIVA_BUCKETS_PATH TIMEOUT_BUCKET TIMEOUT_SIVA

where:

  • SIVA_BUCKETS_PATH is where siva buckets are stored, example: ~/repos/pga/siva/latest
  • TIMEOUT_BUCKET is 10 times the limit of seconds beyond a bucket will be marked as "needed to be reviewed", example: id set to 60, it will wait 6sec for each bucket to be initialized.
  • TIMEOUT_SIVA is 10 times the limit of seconds beyond a siva file will be marked as "slow", example: id set to 20, it will wait 2sec for each bucket to be initialized.

In the first iteration, it tries to initialize each bucked under SIVA_BUCKETS_PATH and if it could not be initialized in less than TIMEOUT_BUCKET/10 seconds, it will be marked as "needed to be reviewed". In the second iteration, it will take all buckets marked as "needed to be reviewed" from the previous iteration, trying each siva file inside it, and adding to error.log those that needed more than TIMEOUT_SIVA/10 seconds to be initialized.

dependencies: gitbase that can be installed with go get -u -v github.com/src-d/gitbase/cmd/gitbase

example:

./analyze.sh ~/repos/pga/siva/latest 60 20

performed well for a PGA subset containing 9k repositories, finding ~10 slow repos in 10 minutes; Once they were deleted from the subset, source{d} Engine was able to run queries over it.

#!/bin/bash
WHERE=$1
SLEEP_FIRST=$2
SLEEP_SECOND=$3
INPUT=INPUT.log
SUCCESS_LOG=success.log
ERROR_LOG=error.log
ERROR_LOG_VERBOSE=error_verbose.log
rm -f ${INPUT}
rm -f ${SUCCESS_LOG}
rm -f ${ERROR_LOG}
rm -f ${ERROR_LOG_VERBOSE}
function analyze {
LOG=gitbase.log
READY="server started and listening on localhost:3306"
COUNT=`find $1 -name "*.siva" | wc -l`
echo "PROCESSING '$1' : ${COUNT} siva files"
gitbase server --directories $1 --port=3306 --index=/tmp/gitbase 1> ${LOG} 2>> ${LOG} &
elapsed=0
until [ -n "`grep \"${READY}\" < ${LOG}`" ] || [ "${elapsed}" -gt "$2" ]; do
elapsed=$((elapsed + 1));
sleep .1;
done
killall gitbase;
grep "${READY}" < ${LOG}
if [ "$?" = 1 ]
then
echo $1 >> ${ERROR_LOG}
echo "$1 contained ${COUNT} repositories" >> ${ERROR_LOG_VERBOSE}
else
echo $1 >> ${SUCCESS_LOG}
fi
}
for dir in ${WHERE}/*; do
analyze ${dir} ${SLEEP_FIRST};
done
mv ${SUCCESS_LOG} first_${SUCCESS_LOG}
mv ${ERROR_LOG} ${INPUT}
mv ${ERROR_LOG_VERBOSE} first_${ERROR_LOG_VERBOSE}
while read line; do
for siva in ${line}/*; do
analyze ${siva} ${SLEEP_SECOND};
done
done <${INPUT}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment