37 38 37 38 38 40 37 37 38 37 37
What's wrong? Why (look at the data)? How do we fix it?
37 37 37 37 37 37 37 37 37 37 37
or...
pandas¶http://pandas.pydata.org/index.html
pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive.
Basically, if you have a big spreadsheet of data with mixed types.
C:\Users\Akhlore\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3071: DtypeWarning: Columns (10) have mixed types.Specify dtype option on import or set low_memory=False. has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
Note: This is a large dataset so I am preemptively selecting a subset of columns. Try to avoid loading this data more than once as it may take up all your memory and you'll have to restart the python kernel
| Facility ID | Gender | Length of Stay | CCS Diagnosis Code | CCS Diagnosis Description | CCS Procedure Code | CCS Procedure Description | Attending Provider License Number | Operating Provider License Number | Total Charges | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 37.0 | F | 8 | 197 | Skin and subcutaneous tissue infections | 0 | NO PROC | 90335341.0 | NaN | $9546.85 |
| 1 | 37.0 | F | 3 | 146 | Diverticulosis and diverticulitis | 0 | NO PROC | 90335341.0 | NaN | $11462.75 |
| 2 | 37.0 | M | 1 | 50 | Diabetes mellitus with complications | 202 | ELECTROCARDIOGRAM | 90335341.0 | 167816.0 | $1609.40 |
| 3 | 37.0 | F | 1 | 154 | Noninfectious gastroenteritis | 202 | ELECTROCARDIOGRAM | 90335341.0 | 167816.0 | $2638.75 |
| 4 | 37.0 | F | 3 | 124 | Acute and chronic tonsillitis | 0 | NO PROC | 90335341.0 | NaN | $3538.25 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2367278 | 943.0 | F | 1 | 245 | Syncope | 0 | NO PROC | 156102.0 | NaN | $10074.00 |
| 2367279 | 943.0 | M | 1 | 149 | Biliary tract disease | 0 | NO PROC | 267443.0 | NaN | $21252.00 |
| 2367280 | 943.0 | F | 1 | 102 | Nonspecific chest pain | 0 | NO PROC | 267443.0 | NaN | $11673.00 |
| 2367281 | 943.0 | F | 1 | 660 | Alcohol-related disorders | 171 | SUTURE SKIN/SUBCUT TISS | 267443.0 | 249597.0 | $16722.00 |
| 2367282 | 943.0 | M | 1 | 2 | Septicemia (except in labor) | 0 | NO PROC | 251080.0 | NaN | $11637.00 |
2367283 rows × 10 columns
array([[37.0, 'F', '8', ..., 90335341.0, nan, '$9546.85'],
[37.0, 'F', '3', ..., 90335341.0, nan, '$11462.75'],
[37.0, 'M', '1', ..., 90335341.0, 167816.0, '$1609.40'],
...,
[943.0, 'F', '1', ..., 267443.0, nan, '$11673.00'],
[943.0, 'F', '1', ..., 267443.0, 249597.0, '$16722.00'],
[943.0, 'M', '1', ..., 251080.0, nan, '$11637.00']], dtype=object)
Columns (and rows) have names that you can use to access them.
Index(['Facility ID', 'Gender', 'Length of Stay', 'CCS Diagnosis Code',
'CCS Diagnosis Description', 'CCS Procedure Code',
'CCS Procedure Description', 'Attending Provider License Number',
'Operating Provider License Number', 'Total Charges'],
dtype='object')
0 F
1 F
2 M
3 F
4 F
..
2367278 F
2367279 M
2367280 F
2367281 F
2367282 M
Name: Gender, Length: 2367283, dtype: object
0 197 1 146 2 50 Name: CCS Diagnosis Code, dtype: int64
[] slices by rows, but indexes by column name - must provide range or it interprets the index as a column label.
| Facility ID | Gender | Length of Stay | CCS Diagnosis Code | CCS Diagnosis Description | CCS Procedure Code | CCS Procedure Description | Attending Provider License Number | Operating Provider License Number | Total Charges | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 37.0 | F | 8 | 197 | Skin and subcutaneous tissue infections | 0 | NO PROC | 90335341.0 | NaN | $9546.85 |
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 2645 try: -> 2646 return self._engine.get_loc(key) 2647 except KeyError: pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 0 During handling of the above exception, another exception occurred: KeyError Traceback (most recent call last) <ipython-input-59-3512e1a6c539> in <module> ----> 1 data[0] ~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key) 2798 if self.columns.nlevels > 1: 2799 return self._getitem_multilevel(key) -> 2800 indexer = self.columns.get_loc(key) 2801 if is_integer(indexer): 2802 indexer = [indexer] ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 2646 return self._engine.get_loc(key) 2647 except KeyError: -> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance) 2650 if indexer.ndim > 1 or indexer.size > 1: pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 0
iloc: Position indexing¶If you want to reference a pandas data frame with position based indexing, use .iloc - work's just like numpy
'F'
Facility ID 37 Gender F Length of Stay 8 CCS Diagnosis Code 197 CCS Diagnosis Description Skin and subcutaneous tissue infections CCS Procedure Code 0 CCS Procedure Description NO PROC Attending Provider License Number 9.03353e+07 Operating Provider License Number NaN Total Charges $9546.85 Name: 0, dtype: object
Pandas uses NaN to indicate missing data
Facility ID 37 Gender F Length of Stay 8 CCS Diagnosis Code 197 CCS Diagnosis Description Skin and subcutaneous tissue infections CCS Procedure Code 0 CCS Procedure Description NO PROC Attending Provider License Number 9.03353e+07 Operating Provider License Number NaN Total Charges 9546.85 Charge per day 1193.36 Name: 0, dtype: object
Facility ID 37 Gender F Length of Stay 8 CCS Diagnosis Code 197 CCS Diagnosis Description Skin and subcutaneous tissue infections CCS Procedure Code 0 CCS Procedure Description NO PROC Attending Provider License Number 9.03353e+07 Operating Provider License Number NaN Total Charges $9546.85 Name: 0, dtype: object
.loc: Label indexing¶You can also index by the label names. Note the rows are being indexed by their named index.
0 F 1 F 2 M 3 F Name: Gender, dtype: object
| CCS Diagnosis Description | CCS Diagnosis Code | |
|---|---|---|
| 0 | Skin and subcutaneous tissue infections | 197 |
| 3 | Noninfectious gastroenteritis | 154 |
| 5 | Influenza | 123 |
| Facility ID | Gender | Length of Stay | CCS Diagnosis Code | CCS Diagnosis Description | CCS Procedure Code | CCS Procedure Description | Attending Provider License Number | Operating Provider License Number | Total Charges | |
|---|---|---|---|---|---|---|---|---|---|---|
| 10 | 37.0 | F | 3 | 123 | Influenza | 0 | NO PROC | 90335341.0 | NaN | $4566.15 |
| 11 | 37.0 | M | 7 | 122 | Pneumonia (except that caused by tuberculosis ... | 202 | ELECTROCARDIOGRAM | 90335341.0 | 167816.0 | $9822.90 |
| 12 | 37.0 | M | 3 | 122 | Pneumonia (except that caused by tuberculosis ... | 0 | NO PROC | 90335341.0 | NaN | $5063.05 |
| 13 | 37.0 | M | 2 | 155 | Other gastrointestinal disorders | 0 | NO PROC | 90335341.0 | NaN | $3125.75 |
| 14 | 37.0 | F | 3 | 122 | Pneumonia (except that caused by tuberculosis ... | 0 | NO PROC | 90335341.0 | NaN | $5055.45 |
| 15 | 37.0 | F | 6 | 127 | Chronic obstructive pulmonary disease and bron... | 0 | NO PROC | 90335341.0 | NaN | $9734.05 |
| 16 | 37.0 | F | 4 | 127 | Chronic obstructive pulmonary disease and bron... | 0 | NO PROC | 90335341.0 | NaN | $7168.05 |
| 17 | 37.0 | F | 2 | 197 | Skin and subcutaneous tissue infections | 0 | NO PROC | 90301264.0 | NaN | $2812.85 |
| 18 | 37.0 | M | 3 | 58 | Other nutritional; endocrine; and metabolic di... | 0 | NO PROC | 90335341.0 | NaN | $3377.50 |
| 19 | 37.0 | F | 2 | 125 | Acute bronchitis | 0 | NO PROC | 90335341.0 | NaN | $3214.25 |
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-67-4009993f2f6d> in <module> ----> 1 data[10:20].loc[[0,3,5]] ~\anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key) 1766 1767 maybe_callable = com.apply_if_callable(key, self.obj) -> 1768 return self._getitem_axis(maybe_callable, axis=axis) 1769 1770 def _is_scalar_access(self, key: Tuple): ~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis) 1952 raise ValueError("Cannot index with multidimensional key") 1953 -> 1954 return self._getitem_iterable(key, axis=axis) 1955 1956 # nested tuple slicing ~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_iterable(self, key, axis) 1593 else: 1594 # A collection of keys -> 1595 keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False) 1596 return self.obj._reindex_with_indexers( 1597 {axis: [keyarr, indexer]}, copy=True, allow_dups=True ~\anaconda3\lib\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis, raise_missing) 1550 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr) 1551 -> 1552 self._validate_read_indexer( 1553 keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing 1554 ) ~\anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing) 1638 if missing == len(indexer): 1639 axis_name = self.obj._get_axis_name(axis) -> 1640 raise KeyError(f"None of [{key}] are in the [{axis_name}]") 1641 1642 # We (temporarily) allow for some missing keys with .loc, except in KeyError: "None of [Int64Index([0, 3, 5], dtype='int64')] are in the [index]"
| Facility ID | Gender | Length of Stay | CCS Diagnosis Code | CCS Diagnosis Description | CCS Procedure Code | CCS Procedure Description | Attending Provider License Number | Operating Provider License Number | Total Charges | |
|---|---|---|---|---|---|---|---|---|---|---|
| 10 | 37.0 | F | 3 | 123 | Influenza | 0 | NO PROC | 90335341.0 | NaN | $4566.15 |
| 13 | 37.0 | M | 2 | 155 | Other gastrointestinal disorders | 0 | NO PROC | 90335341.0 | NaN | $3125.75 |
| 15 | 37.0 | F | 6 | 127 | Chronic obstructive pulmonary disease and bron... | 0 | NO PROC | 90335341.0 | NaN | $9734.05 |
Just like numpy we can index by a boolean array or an array of indices.
| Facility ID | Gender | Length of Stay | CCS Diagnosis Code | CCS Diagnosis Description | CCS Procedure Code | CCS Procedure Description | Attending Provider License Number | Operating Provider License Number | Total Charges | |
|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 37.0 | M | 1 | 50 | Diabetes mellitus with complications | 202 | ELECTROCARDIOGRAM | 90335341.0 | 167816.0 | $1609.40 |
| 6 | 37.0 | M | 5 | 122 | Pneumonia (except that caused by tuberculosis ... | 0 | NO PROC | 90335341.0 | NaN | $6148.10 |
| 7 | 37.0 | M | 3 | 123 | Influenza | 0 | NO PROC | 90335341.0 | NaN | $4204.15 |
| Facility ID | CCS Diagnosis Code | CCS Procedure Code | |
|---|---|---|---|
| 0 | 37.0 | 197 | 0 |
| 1 | 37.0 | 146 | 0 |
| 2 | 37.0 | 50 | 202 |
| Facility ID | Gender | Length of Stay | CCS Diagnosis Code | CCS Diagnosis Description | CCS Procedure Code | CCS Procedure Description | Attending Provider License Number | Operating Provider License Number | Total Charges | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1426998 | 1439.0 | M | 1 | 108 | Congestive heart failure; nonhypertensive | 0 | NO PROC | 251948.0 | NaN | $0.50 |
| 780065 | 989.0 | M | 2 | 657 | Mood disorders | 218 | PSYCHO/PSYCHI EVAL/THER | 199003.0 | 154247.0 | $1.00 |
| 781531 | 989.0 | F | 1 | 661 | Substance-related disorders | 0 | NO PROC | 145418.0 | NaN | $1.00 |
| Facility ID | Gender | Length of Stay | CCS Diagnosis Code | CCS Diagnosis Description | CCS Procedure Code | CCS Procedure Description | Attending Provider License Number | Operating Provider License Number | Total Charges | |
|---|---|---|---|---|---|---|---|---|---|---|
| 77722 | 213.0 | F | 10 | 100 | Acute myocardial infarction | 44 | COR ARTERY BYP GRF-CABG | 224769.0 | 224769.0 | $99999.65 |
| 451015 | 635.0 | F | 1 | 115 | Aortic; peripheral; and visceral artery aneurysms | 52 | AORTIC RESECTION; REPL | 232988.0 | 232988.0 | $99999.46 |
| 1579047 | 1456.0 | M | 12 | 2 | Septicemia (except in labor) | 157 | AMPUTATE LOWER EXTRMITY | 258717.0 | 265448.0 | $99998.45 |
Can apply standard string functions to all cells. This returns the changed value; it does not mutate in place.
The above overwrites the previous Total Charges column to be a floating point number instead of a string with a dollar sign.
| Facility ID | Gender | Length of Stay | CCS Diagnosis Code | CCS Diagnosis Description | CCS Procedure Code | CCS Procedure Description | Attending Provider License Number | Operating Provider License Number | Total Charges | |
|---|---|---|---|---|---|---|---|---|---|---|
| 967254 | 1169.0 | M | 120 + | 63 | Diseases of white blood cells | 64 | BONE MARROW TRANSPLANT | 198304.0 | 229870.0 | 8593455.88 |
| 1560003 | 1456.0 | F | 120 + | 143 | Abdominal hernia | 86 | OTHER HERNIA REPAIR | 165181.0 | 165181.0 | 6272871.31 |
| 957685 | 1169.0 | F | 120 + | 6 | Hepatitis | 176 | OT ORGAN TRANSPLANTATN | 236414.0 | 183253.0 | 5745201.42 |
| Facility ID | Gender | Length of Stay | CCS Diagnosis Code | CCS Diagnosis Description | CCS Procedure Code | CCS Procedure Description | Attending Provider License Number | Operating Provider License Number | Total Charges | Charge per day | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 37.0 | F | 8.0 | 197 | Skin and subcutaneous tissue infections | 0 | NO PROC | 90335341.0 | NaN | 9546.85 | 1193.356250 |
| 1 | 37.0 | F | 3.0 | 146 | Diverticulosis and diverticulitis | 0 | NO PROC | 90335341.0 | NaN | 11462.75 | 3820.916667 |
| 2 | 37.0 | M | 1.0 | 50 | Diabetes mellitus with complications | 202 | ELECTROCARDIOGRAM | 90335341.0 | 167816.0 | 1609.40 | 1609.400000 |
| 3 | 37.0 | F | 1.0 | 154 | Noninfectious gastroenteritis | 202 | ELECTROCARDIOGRAM | 90335341.0 | 167816.0 | 2638.75 | 2638.750000 |
| 4 | 37.0 | F | 3.0 | 124 | Acute and chronic tonsillitis | 0 | NO PROC | 90335341.0 | NaN | 3538.25 | 1179.416667 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2367278 | 943.0 | F | 1.0 | 245 | Syncope | 0 | NO PROC | 156102.0 | NaN | 10074.00 | 10074.000000 |
| 2367279 | 943.0 | M | 1.0 | 149 | Biliary tract disease | 0 | NO PROC | 267443.0 | NaN | 21252.00 | 21252.000000 |
| 2367280 | 943.0 | F | 1.0 | 102 | Nonspecific chest pain | 0 | NO PROC | 267443.0 | NaN | 11673.00 | 11673.000000 |
| 2367281 | 943.0 | F | 1.0 | 660 | Alcohol-related disorders | 171 | SUTURE SKIN/SUBCUT TISS | 267443.0 | 249597.0 | 16722.00 | 16722.000000 |
| 2367282 | 943.0 | M | 1.0 | 2 | Septicemia (except in labor) | 0 | NO PROC | 251080.0 | NaN | 11637.00 | 11637.000000 |
2367283 rows × 11 columns
nan
'120 +'
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) pandas\_libs\lib.pyx in pandas._libs.lib.maybe_convert_numeric() ValueError: Unable to parse string "120 +" During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) <ipython-input-81-260213ec4baa> in <module> ----> 1 pd.to_numeric('120 +') ~\anaconda3\lib\site-packages\pandas\core\tools\numeric.py in to_numeric(arg, errors, downcast) 147 coerce_numeric = errors not in ("ignore", "raise") 148 try: --> 149 values = lib.maybe_convert_numeric( 150 values, set(), coerce_numeric=coerce_numeric 151 ) pandas\_libs\lib.pyx in pandas._libs.lib.maybe_convert_numeric() ValueError: Unable to parse string "120 +" at position 0
Group records that have the same value for a column
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001FFD926BBB0>
We can then apply an aggregating function to these groups.
| Length of Stay | CCS Diagnosis Code | CCS Procedure Code | Attending Provider License Number | Operating Provider License Number | Total Charges | Charge per day | |
|---|---|---|---|---|---|---|---|
| Facility ID | |||||||
| 377.0 | 2.670732 | 121.170732 | 202.439024 | 4.664694e+07 | 4.826434e+07 | 4060.734024 | 1614.759785 |
| 111.0 | 3.611111 | 159.777778 | 0.000000 | 1.929986e+05 | NaN | 4492.587778 | 1473.003327 |
| 37.0 | 3.162791 | 123.906977 | 54.779070 | 8.298531e+07 | 7.666251e+06 | 4939.879651 | 1628.579552 |
| 9250.0 | 2.203456 | 205.840173 | 137.955076 | 1.082429e+07 | 1.264187e+07 | 5618.928860 | 2497.869519 |
| 165.0 | 4.597884 | 180.206349 | 181.920635 | 2.051072e+05 | 4.744077e+07 | 6376.906455 | 1580.321499 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 563.0 | 6.279619 | 129.367643 | 78.085210 | 2.363745e+05 | 6.581608e+05 | 89385.986450 | 20937.033431 |
| 1446.0 | 5.633545 | 214.859847 | 157.190406 | 3.497396e+05 | 3.955800e+05 | 124204.521077 | 36290.348526 |
| 1139.0 | 8.937250 | 205.031446 | 87.044577 | 3.341266e+05 | 3.847957e+05 | 131992.454991 | 17629.688410 |
| 1138.0 | 38.527005 | 215.072013 | 213.108020 | 1.963923e+05 | 1.966056e+05 | 175997.805859 | 4195.405483 |
| 1486.0 | 61.821429 | 134.039683 | 194.801587 | 1.863599e+05 | 1.921974e+05 | 195780.272579 | 2486.885826 |
215 rows × 7 columns
| Total Charges | |
|---|---|
| Gender | |
| U | 9858.198939 |
| F | 36867.585805 |
| M | 44419.344760 |
The group by column has become an indexing column. Need to reset_index to convert index to columns.
| Gender | Total Charges | |
|---|---|---|
| 0 | U | 9858.198939 |
| 1 | F | 36867.585805 |
| 2 | M | 44419.344760 |
CCS Procedure Description Gender
ABDOMINAL PARACENTESIS F 56996.536195
M 58532.559231
ABORTION (TERM OF PREG) F 26512.622947
ALCO/DRUG REHAB/DETOX F 18954.578164
M 18187.073614
...
UPPER GI X-RAY M 31221.460246
URETERAL CATHETERIZATN F 36143.329481
M 34182.121658
VARI VEIN STRIP;LOW LMB F 42519.938571
M 65763.081875
Name: Total Charges, Length: 446, dtype: float64
unstack Pivot a level of the (necessarily hierarchical) index labels.
| Gender | F | M | U |
|---|---|---|---|
| CCS Procedure Description | |||
| ABDOMINAL PARACENTESIS | 56996.536195 | 58532.559231 | NaN |
| ABORTION (TERM OF PREG) | 26512.622947 | NaN | NaN |
| ALCO/DRUG REHAB/DETOX | 18954.578164 | 18187.073614 | NaN |
| AMPUTATE LOWER EXTRMITY | 100926.639344 | 92821.328596 | NaN |
| AORTIC RESECTION; REPL | 144019.629159 | 114210.484050 | NaN |
| ... | ... | ... | ... |
| UNGROUPABLE | 68101.627500 | 105811.388636 | NaN |
| UP GASTRO ENDOSC/BIOPSY | 42625.453678 | 43340.006917 | NaN |
| UPPER GI X-RAY | 28339.654074 | 31221.460246 | NaN |
| URETERAL CATHETERIZATN | 36143.329481 | 34182.121658 | NaN |
| VARI VEIN STRIP;LOW LMB | 42519.938571 | 65763.081875 | NaN |
233 rows × 3 columns
CCS Procedure Description
ABDOMINAL PARACENTESIS -1536.023036
ABORTION (TERM OF PREG) NaN
ALCO/DRUG REHAB/DETOX 767.504550
AMPUTATE LOWER EXTRMITY 8105.310748
AORTIC RESECTION; REPL 29809.145110
...
UNGROUPABLE -37709.761136
UP GASTRO ENDOSC/BIOPSY -714.553239
UPPER GI X-RAY -2881.806172
URETERAL CATHETERIZATN 1961.207823
VARI VEIN STRIP;LOW LMB -23243.143304
Length: 233, dtype: float64
CCS Procedure Description
ABDOMINAL PARACENTESIS -1536.023036
ALCO/DRUG REHAB/DETOX 767.504550
AMPUTATE LOWER EXTRMITY 8105.310748
AORTIC RESECTION; REPL 29809.145110
APPENDECTOMY 209.527573
...
UNGROUPABLE -37709.761136
UP GASTRO ENDOSC/BIOPSY -714.553239
UPPER GI X-RAY -2881.806172
URETERAL CATHETERIZATN 1961.207823
VARI VEIN STRIP;LOW LMB -23243.143304
Length: 202, dtype: float64
CCS Procedure Description EXTRA CIRC AUX OPEN HRT -69453.636115 DX PRCS ON EYE -63462.345439 SWAN-GANZ CATH MONITOR -46738.846837 PRCS ON SPLEEN -46219.301962 CORNEAL TRANSPLANT -45028.501148 dtype: float64
CCS Procedure Description LENS & CATARACT PRCS 35378.240602 OT INTRAOCULAR THER PRC 35907.146236 BONE MARROW TRANSPLANT 37066.859998 DES LES RETINA/CHOROID 52461.621223 CORONARY THROMBOLYSIS 87870.550000 dtype: float64
pd.concat concatenates rows (i.e., default axis=0) while merging columns with the same name.
| A | B | C | D | |
|---|---|---|---|---|
| 0 | A0 | B0 | C0 | D0 |
| 1 | A1 | B1 | C1 | D1 |
| 2 | A2 | B2 | C2 | D2 |
| 3 | A3 | B3 | C3 | D3 |
| B | D | F | |
|---|---|---|---|
| 2 | B2 | D2 | F2 |
| 3 | B3 | D3 | F3 |
| 6 | B6 | D6 | F6 |
| 7 | B7 | D7 | F7 |
| A | B | C | D | F | |
|---|---|---|---|---|---|
| 0 | A0 | B0 | C0 | D0 | NaN |
| 1 | A1 | B1 | C1 | D1 | NaN |
| 2 | A2 | B2 | C2 | D2 | NaN |
| 3 | A3 | B3 | C3 | D3 | NaN |
| 2 | NaN | B2 | NaN | D2 | F2 |
| 3 | NaN | B3 | NaN | D3 | F3 |
| 6 | NaN | B6 | NaN | D6 | F6 |
| 7 | NaN | B7 | NaN | D7 | F7 |
You can join two tables on a specific column (or columns). Rows that has the same value (or key) in that column will be combined.
| key | A | B | |
|---|---|---|---|
| 0 | K0 | A0 | B0 |
| 1 | K1 | A1 | B1 |
| 2 | K2 | A2 | B2 |
| 3 | K3 | A3 | B3 |
| key | C | D | |
|---|---|---|---|
| 0 | K1 | C1 | D1 |
| 1 | K2 | C2 | D2 |
| 2 | K4 | C4 | D4 |
| key | A | B | C | D | |
|---|---|---|---|---|---|
| 0 | K1 | A1 | B1 | C1 | D1 |
| 1 | K2 | A2 | B2 | C2 | D2 |
| key | A | B | C | D | |
|---|---|---|---|---|---|
| 0 | K0 | A0 | B0 | NaN | NaN |
| 1 | K1 | A1 | B1 | C1 | D1 |
| 2 | K2 | A2 | B2 | C2 | D2 |
| 3 | K3 | A3 | B3 | NaN | NaN |
| 4 | K4 | NaN | NaN | C4 | D4 |
| key | A | B | C | D | |
|---|---|---|---|---|---|
| 0 | K0 | A0 | B0 | NaN | NaN |
| 1 | K1 | A1 | B1 | C1 | D1 |
| 2 | K2 | A2 | B2 | C2 | D2 |
| 3 | K3 | A3 | B3 | NaN | NaN |
Download: https://mscbio2025.csb.pitt.edu/files/Hospital_Inpatient_Discharges__SPARCS_De-Identified___2014.csv
Is the percentage of people who go to these facilities with the most common diagnosis and receive a coronary bypass significantly different between these two groups?
What about knee replacements?