Skip to content

Instantly share code, notes, and snippets.

@rnelsonchem
Last active March 24, 2020 02:39
Show Gist options
  • Save rnelsonchem/0e3a5c6cda94faef66905afc3404b2e9 to your computer and use it in GitHub Desktop.
Save rnelsonchem/0e3a5c6cda94faef66905afc3404b2e9 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get the code from my [Github repo](https://github.com/rnelsonchem/lcagg). This loads it into your local Python installation. There are other ways to do this. If changes are made in Github, you may need to uninstall this module first, using the second code block."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"!pip install --upgrade -q git+https://github.com/rnelsonchem/lcagg.git"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#!pip uninstall -y lcagg"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Import the module that I wrote."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import lcagg"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This module only has one class definition, `LcCsv`. When making an instance of this class, you should give the name of an HDF5 file. This can be an existing file, but if it isn't there it will be created. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"lc = lcagg.LcCsv('temp.h5')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To add data to this HDF file, you need to use the `folder_proc` method. This takes a couple arguments:\n",
"\n",
"* Folder path (string, mandatory) -- What is the base folder path that contains all the data. This folder will be searched recursively for all of the CSV files that match the correct file naming specs. \n",
"\n",
"* sample_str (string, optional) -- This is the regex that will match for your sample names out of the file names. This should do one substring search, which is your sample name. For example, `01-02-rcn000_222_Integration310.csv` could use `'(rcn\\d+_\\d+)'` as the search string. This should match rcn000_222 as the sample name. \n",
"\n",
"This could take a while if you have lots of files to process. However, you don't need to run this if you don't have any new files. Data files that are already contained in the HDF file will be ignored, which saves some time."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"lc.folder_proc('H:\\HPLC')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Just so you know what this folder looks like... (Not a python command)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Volume in drive H is Data\n",
" Volume Serial Number is 18E2-6566\n",
"\n",
" Directory of H:\\HPLC\n",
"\n",
"03/17/2020 11:42 AM <DIR> .\n",
"03/17/2020 11:42 AM <DIR> ..\n",
"03/14/2020 10:24 AM <DIR> rcn_cbr03 2020-02-27 11-24-42\n",
"03/14/2020 10:24 AM <DIR> rcn_cbr03 2020-02-27 13-01-44\n",
"03/14/2020 10:25 AM <DIR> rcn_cbr03 2020-02-27 13-40-59\n",
"03/14/2020 10:25 AM <DIR> rcn_cbr03 2020-02-27 15-22-56\n",
"03/14/2020 10:25 AM <DIR> rcn_cbr03 2020-02-27 16-06-36\n",
"03/14/2020 10:25 AM <DIR> rcn_cbr03 2020-02-27 17-42-35\n",
"03/14/2020 10:25 AM <DIR> rcn_cbr03 2020-02-27 19-22-28\n",
"03/14/2020 10:25 AM <DIR> rcn_cbr03 2020-02-28 08-15-38\n",
"03/14/2020 10:26 AM <DIR> rcn_cbr03 2020-02-28 13-43-05\n",
"03/14/2020 10:26 AM <DIR> rcn_cbr03 2020-02-28 15-22-21\n",
"03/14/2020 10:26 AM <DIR> rcn_cbr03 2020-02-28 16-45-35\n",
"03/14/2020 10:26 AM <DIR> rcn_cbr03 2020-02-28 18-34-32\n",
"03/14/2020 10:26 AM <DIR> rcn_cbr03 2020-03-02 08-56-03\n",
"03/14/2020 10:26 AM <DIR> rcn_cbr03 2020-03-02 09-34-10\n",
"03/14/2020 10:26 AM <DIR> rcn_cbr03 2020-03-02 10-38-16\n",
"03/14/2020 10:26 AM <DIR> rcn_cbr03 2020-03-02 11-31-47\n",
"03/14/2020 10:26 AM <DIR> rcn_cbr03 2020-03-02 12-26-36\n",
"03/14/2020 10:26 AM <DIR> rcn_cbr03 2020-03-02 14-25-33\n",
"03/14/2020 10:26 AM <DIR> rcn_cbr03 2020-03-02 16-06-15\n",
"03/14/2020 10:27 AM <DIR> rcn_cbr03 2020-03-02 17-56-31\n",
"03/14/2020 10:27 AM <DIR> rcn_cbr03 2020-03-03 09-06-56\n",
"03/14/2020 10:28 AM <DIR> rcn_cbr03 2020-03-03 14-41-23\n",
"03/14/2020 10:28 AM <DIR> rcn_cbr03 2020-03-03 16-43-09\n",
"03/14/2020 10:28 AM <DIR> rcn_cbr03 2020-03-03 19-19-45\n",
"03/14/2020 10:28 AM <DIR> rcn_cbr03 2020-03-04 10-17-38\n",
"03/14/2020 10:28 AM <DIR> rcn_cbr03 2020-03-04 11-29-14\n",
"03/14/2020 10:28 AM <DIR> rcn_cbr03 2020-03-04 14-26-43\n",
"03/14/2020 10:28 AM <DIR> rcn_cbr03 2020-03-06 08-37-31\n",
"03/14/2020 10:29 AM <DIR> rcn_cbr03 2020-03-06 12-54-18\n",
"03/14/2020 10:29 AM <DIR> rcn_cbr03 2020-03-06 15-17-37\n",
"03/14/2020 10:29 AM <DIR> rcn_cbr03 2020-03-06 16-33-35\n",
"03/14/2020 10:29 AM <DIR> rcn_cbr03 2020-03-06 17-43-42\n",
"03/14/2020 10:30 AM <DIR> rcn_cbr03 2020-03-09 09-34-59\n",
"03/14/2020 10:30 AM <DIR> rcn_cbr03 2020-03-09 13-40-08\n",
"03/14/2020 10:30 AM <DIR> rcn_cbr03 2020-03-09 15-34-30\n",
"03/14/2020 10:30 AM <DIR> rcn_cbr03 2020-03-10 10-15-55\n",
"03/14/2020 10:30 AM <DIR> rcn_cbr03 2020-03-10 17-48-54\n",
"03/14/2020 10:30 AM <DIR> rcn_cbr03 2020-03-11 08-48-42\n",
"03/14/2020 10:30 AM <DIR> rcn_cbr03 2020-03-11 14-26-48\n",
"03/14/2020 10:30 AM <DIR> rcn_cbr03 2020-03-11 15-18-38\n",
"03/14/2020 10:31 AM <DIR> rcn_cbr03 2020-03-11 16-27-38\n",
"03/14/2020 10:31 AM <DIR> rcn_cbr03 2020-03-12 09-10-38\n",
"03/14/2020 10:31 AM <DIR> rcn_cbr03 2020-03-12 14-25-10\n",
"03/14/2020 10:31 AM <DIR> rcn_cbr03 2020-03-12 15-46-43\n",
"03/14/2020 10:32 AM <DIR> rcn_cbr03 2020-03-13 09-57-13\n",
"03/14/2020 10:33 AM <DIR> rcn_cbr03 2020-03-13 16-59-37\n",
"03/16/2020 10:55 AM <DIR> rcn_cbr03 2020-03-14 09-42-24\n",
"03/16/2020 01:34 PM <DIR> rcn_cbr03 2020-03-16 10-05-02\n",
"03/16/2020 06:41 PM <DIR> rcn_cbr03 2020-03-16 13-18-52\n",
"03/17/2020 08:50 AM <DIR> rcn_cbr03 2020-03-16 18-28-11\n",
"03/17/2020 11:43 AM <DIR> rcn_cbr03 2020-03-17 08-36-38\n",
" 0 File(s) 0 bytes\n",
" 53 Dir(s) 13,493,698,560 bytes free\n"
]
}
],
"source": [
"%ls H:\\HPLC"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Volume in drive H is Data\n",
" Volume Serial Number is 18E2-6566\n",
"\n",
" Directory of H:\\HPLC\\rcn_cbr03 2020-03-17 08-36-38\n",
"\n",
"03/17/2020 11:43 AM <DIR> .\n",
"03/17/2020 11:43 AM <DIR> ..\n",
"03/17/2020 11:42 AM <DIR> 001-1-blank.D\n",
"03/17/2020 11:42 AM <DIR> 002-71-rcn044_038.D\n",
"03/17/2020 11:42 AM <DIR> 003-72-rcn044_039.D\n",
"03/17/2020 11:42 AM <DIR> 004-73-rcn044_040.D\n",
"03/17/2020 11:43 AM <DIR> 005-74-rcn044_041.D\n",
"03/17/2020 11:43 AM <DIR> 006-75-rcn044_042.D\n",
"03/17/2020 11:43 AM <DIR> 007-76-rcn044_043.D\n",
"03/17/2020 11:43 AM <DIR> 008-1-blank.D\n",
"03/17/2020 11:43 AM <DIR> CBRO3_10min.M\n",
"03/17/2020 08:36 AM 676 Methods.Reg\n",
"03/17/2020 10:33 AM 6,653 rcn_cbr03.B\n",
"03/17/2020 10:33 AM 38,690 rcn_cbr03.LOG\n",
"03/17/2020 10:17 AM 26,831 rcn_cbr03.S\n",
"03/17/2020 08:36 AM 26,843 rcn_cbr03.Start\n",
"03/17/2020 10:33 AM 557,578 sequence.acaml\n",
" 6 File(s) 657,271 bytes\n",
" 11 Dir(s) 13,493,698,560 bytes free\n"
]
}
],
"source": [
"%ls \"H:\\HPLC\\rcn_cbr03 2020-03-17 08-36-38\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `store` attribut is the Pandas HDFStore object, so it will have all the usual methods of that object."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<class 'pandas.io.pytables.HDFStore'>\n",
"File path: temp.h5"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lc.store"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['/rcn044_043/int225',\n",
" '/rcn044_043/int260',\n",
" '/rcn044_043/int315',\n",
" '/rcn044_043/sig225',\n",
" '/rcn044_043/sig260',\n",
" '/rcn044_043/sig315',\n",
" '/rcn044_042/int225',\n",
" '/rcn044_042/int260',\n",
" '/rcn044_042/int315',\n",
" '/rcn044_042/sig225']"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lc.store.keys()[:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can check for the inclusion of sample names in the usual way."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"'rcn044_043' in lc"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The select method lets you select certain files. This takes three different arguments.\n",
"\n",
"* specs (string or list, mandatory): this is a single sample name or a list of sample names. By sample name, I mean something like `rcn021_042`.\n",
"\n",
"* data_type (string, optional, default='signal'): This can be one of two stings 'signal' or 'ints' depending on whether you want the chromatograms or integrations, respectively. \n",
"\n",
"* wl (int, option, default=315): The DAD wavelength that you're interested in viewing. "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>Time (min)</th>\n",
" <th>Absorbance (mAu)</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Spec</th>\n",
" <th>idx</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"11\" valign=\"top\">rcn044_043/sig315</th>\n",
" <th>0</th>\n",
" <td>0.005333</td>\n",
" <td>-0.065327</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.012000</td>\n",
" <td>-0.064850</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.018667</td>\n",
" <td>-0.061989</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.025333</td>\n",
" <td>-0.058174</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.032000</td>\n",
" <td>-0.056744</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1196</th>\n",
" <td>7.978667</td>\n",
" <td>-1.059532</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1197</th>\n",
" <td>7.985333</td>\n",
" <td>-1.059532</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1198</th>\n",
" <td>7.992000</td>\n",
" <td>-1.059532</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1199</th>\n",
" <td>7.998667</td>\n",
" <td>-1.059532</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1200</th>\n",
" <td>8.005333</td>\n",
" <td>-1.059532</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1201 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" Time (min) Absorbance (mAu)\n",
"Spec idx \n",
"rcn044_043/sig315 0 0.005333 -0.065327\n",
" 1 0.012000 -0.064850\n",
" 2 0.018667 -0.061989\n",
" 3 0.025333 -0.058174\n",
" 4 0.032000 -0.056744\n",
"... ... ...\n",
" 1196 7.978667 -1.059532\n",
" 1197 7.985333 -1.059532\n",
" 1198 7.992000 -1.059532\n",
" 1199 7.998667 -1.059532\n",
" 1200 8.005333 -1.059532\n",
"\n",
"[1201 rows x 2 columns]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lc.select('rcn044_043')"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>Peak</th>\n",
" <th>Retention Time (min)</th>\n",
" <th>Area</th>\n",
" <th>Height (mAu)</th>\n",
" <th>Start</th>\n",
" <th>End</th>\n",
" <th>StartIntensity</th>\n",
" <th>EndIntensity</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Spec</th>\n",
" <th>idx</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"4\" valign=\"top\">rcn044_042/int315</th>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0.856753</td>\n",
" <td>42.433285</td>\n",
" <td>7.323885</td>\n",
" <td>0.808167</td>\n",
" <td>1.043914</td>\n",
" <td>0.000000e+00</td>\n",
" <td>1.039555e+00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1.087346</td>\n",
" <td>5.839992</td>\n",
" <td>1.897473</td>\n",
" <td>1.043914</td>\n",
" <td>1.128167</td>\n",
" <td>1.039555e+00</td>\n",
" <td>-2.220400e-16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1.598170</td>\n",
" <td>2771.099850</td>\n",
" <td>631.653137</td>\n",
" <td>1.421500</td>\n",
" <td>2.034833</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1.700501</td>\n",
" <td>151.509247</td>\n",
" <td>46.786404</td>\n",
" <td>1.661500</td>\n",
" <td>2.034811</td>\n",
" <td>9.309910e-08</td>\n",
" <td>1.019170e-06</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"4\" valign=\"top\">rcn044_043/int315</th>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0.831944</td>\n",
" <td>996.368896</td>\n",
" <td>369.537415</td>\n",
" <td>0.752000</td>\n",
" <td>1.129670</td>\n",
" <td>0.000000e+00</td>\n",
" <td>-5.551100e-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1.023889</td>\n",
" <td>36.743076</td>\n",
" <td>10.348648</td>\n",
" <td>0.952400</td>\n",
" <td>1.128649</td>\n",
" <td>4.948450e-08</td>\n",
" <td>2.157690e-08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1.591451</td>\n",
" <td>4019.096440</td>\n",
" <td>821.574341</td>\n",
" <td>1.416510</td>\n",
" <td>2.050000</td>\n",
" <td>1.220302e-02</td>\n",
" <td>0.000000e+00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1.700953</td>\n",
" <td>185.714676</td>\n",
" <td>61.631153</td>\n",
" <td>1.665333</td>\n",
" <td>2.047892</td>\n",
" <td>5.964450e-08</td>\n",
" <td>-5.727630e-08</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Peak Retention Time (min) Area Height (mAu) \\\n",
"Spec idx \n",
"rcn044_042/int315 0 1 0.856753 42.433285 7.323885 \n",
" 1 2 1.087346 5.839992 1.897473 \n",
" 2 3 1.598170 2771.099850 631.653137 \n",
" 3 4 1.700501 151.509247 46.786404 \n",
"rcn044_043/int315 0 1 0.831944 996.368896 369.537415 \n",
" 1 2 1.023889 36.743076 10.348648 \n",
" 2 3 1.591451 4019.096440 821.574341 \n",
" 3 4 1.700953 185.714676 61.631153 \n",
"\n",
" Start End StartIntensity EndIntensity \n",
"Spec idx \n",
"rcn044_042/int315 0 0.808167 1.043914 0.000000e+00 1.039555e+00 \n",
" 1 1.043914 1.128167 1.039555e+00 -2.220400e-16 \n",
" 2 1.421500 2.034833 0.000000e+00 0.000000e+00 \n",
" 3 1.661500 2.034811 9.309910e-08 1.019170e-06 \n",
"rcn044_043/int315 0 0.752000 1.129670 0.000000e+00 -5.551100e-17 \n",
" 1 0.952400 1.128649 4.948450e-08 2.157690e-08 \n",
" 2 1.416510 2.050000 1.220302e-02 0.000000e+00 \n",
" 3 1.665333 2.047892 5.964450e-08 -5.727630e-08 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lc.select(['rcn044_042', 'rcn044_043'], data_type='ints', wl=315)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When you're done, it's a good idea to close the file. (But it won't be the end of the world if you don't.)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"lc.close()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment