Info on some python packages.
- numpy
- pandas
- scipy
- networkx
- BeautifulSoup4
- conda
- More packages
numpy
- Create array:
np.array([(1.5,2,3), (4,5,6)], dtype = float) - Shape:
arr.shape - 1-D arrays can be made from list generator:
np.fromiter(list_gen)(UNVERIFIED) - Infinity:
np.inf - Identity matrix:
np.identity(n)(wherenis number of rows)- For
intelements:np.identity(3, dtype=int)
- For
- Add two matrices:
np.add(m1, m2)or justm1 + m2 - Broadcasting: https://numpy.org/devdocs/user/basics.broadcasting.html
- Default
dtypeisfloat - Create empty (ie, with garbage values) matrix:
np.empty(<shape>) - Create array full of constant values:
np.full(<shape>, val) np.dotandmatmul(@) aren't the same- np.resize and var.resize behaves slightly different, apparently
- np.newaxis is actually an alias for
None?? - Create (main) diagonal matrix:
np.eye(size) np.argmax()??
Assert that two values are equal
assert np.array_equal(a, b)
Norm of a vector
https://numpy.org/devdocs/reference/generated/numpy.linalg.norm.html
Use np.linalg.norm(v)
- Many different forms of norm possible
- Default norm is Frobenius norm (aka magnitude)
- Take absolute values of squares of all elements
- Sum it up
- Take the square root of the whole value
>>> a
array([1, 2, 3])
>>> np.linalg.norm(a)
np.float64(3.7416573867739413)
# (1+4+9)^½
Kronecker product
- aka tensor product.
- Use
np.kron(a, b) - Make copies of
bat every location ofawhere value is1. Kinda.
>>> i22 = np.identity(2)
>>> mul = np.array([[1, 1, 1, 0],
[0, 0, 0, 1]])
>>> np.kron(i22, mul)
array([[1, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 1]])Jargon
- Rank of shape: Number of elements in
shape - C-order: ??
Axis parameter
https://stackoverflow.com/questions/48200911/very-basic-numpy-array-dimension-visualization
- nth axis => nth index
- 'order of indexing into the array'
- In a 2D matrix,
- axis=0 => rows
- axis=1 => column
Stacking
np.vstack: stack two matrices one below the othernp.hstack: stack two matrices side by side
>>> a = np.array([[1,0],[2,0],[3,0]])
>>> a
array([[1, 0],
[2, 0],
[3, 0]])
>>> np.hstack([a,a])
array([[1, 0, 1, 0],
[2, 0, 2, 0],
[3, 0, 3, 0]])
Matrix creation
Create a vector/matrix of all zeros
>>> np.zeros(3)
array([0., 0., 0.])
>>> np.zeros(3).shape
(3,)
>>> np.zeros(3).reshape(1,3)
array([[0., 0., 0.]])
>>> np.zeros(3).reshape(1,3).shape
(1, 3)
###
>>> np.zeros((1,3))
array([[0., 0., 0.]])
>>> np.zeros((1,3)).shape
(1, 3)https://numpy.org/doc/stable/reference/generated/numpy.zeros.html
Make a matrix by giving all values explicitly
>>> np.array([[0, 1], [0, 0]])
array([[0, 1],
[0, 0]])
Create a matrix out of a constant
>>> np.full((2,2), 3)
array([[3, 3],
[3, 3]])
>>> np.full((2,2), np.inf)
array([[inf, inf],
[inf, inf]])
>>> np.full((3,2), [1,2])
array([[1, 2],
[1, 2],
[1, 2]])https://numpy.org/doc/stable/reference/generated/numpy.full.html
Create a matrix from a python list
>>> a = np.array([1,2,3])
array([1, 2, 3])
>>> a.shape
(3,)Matrix multiplication
- Multipy vector with matrix:
np.dot - Concatenate two vectors:
np.concatenate([v1, v2])- Arguments must be given as a list
Determinant of a matrix
Remember that determinant is not defined for non-square matrices.
>>> import numpy as np
>>> np.linalg.det(np.array([[1,2],
... [3,4]]))
...
-2.0000000000000004
https://numpy.org/doc/stable/reference/generated/numpy.linalg.det.html
Eigen values of a matrix
>>> np.linalg.eigvals(np.array([[1,2],
... [3,4]]))
...
array([-0.37228132, 5.37228132])
# Product of Eigen values is determinant
>>> np.prod(np.linalg.eigvals(np.array([[1,2],[3,4]])))
-1.9999999999999998
https://numpy.org/doc/stable/reference/generated/numpy.linalg.eigvals.html
Reading from csv
- np.loadtxt: usable when there are no missing values
- np.genfromtxt: usable even when there are missing values
- https://numpy.org/doc/stable/user/how-to-io.html
Doubts
- np.resize vs np.pad
- np.broadcastto
Misc
- Pacakges for type hints:
Theory
- Matrix addition is commutative, but multiplication isn't
Pandas
Convert a 2D frame to a 1D frame (ie, DataFrame to Series):
df.stack()Read from csv with first column as row index:
pd.read_csv("~/b.csv", index_col=0)Write csv:
df.to_csv("~/out.csv")Index using number instead of names:
iloc[0]Column names:
df.columnsRow names:
df.indexUse
pd.read_csv()Convert a
Seriestonp.array:to_numpy()level??
axis
- 0: stacking one on top of another
Merge two Series to make a DataFrame
>>> pd.concat([s1, s2], axis=1)
>>> pd.concat([s1, s2], axis=1).columns
RangeIndex(start=0, stop=2, step=1)
Examples
# Read from a csv file
>>> df = pd.read_csv("name.csv")
# Columns
>>> df.columns
Index(['Site Type', 'Used', 'Fixed', 'Prohibited', 'Available', 'Util%'], dtype='object')
# Number of rows
>>> a.index
RangeIndex(start=0, stop=16, step=1)
scipy
Sparse matrix
https://docs.scipy.org/doc/scipy/reference/sparse.html
>>> import scipy
>>> import numpy as np
>>> X = scipy.sparse.csr_matrix(1./2.*np.array([[0.,1.],[1.,0.]]))
>>> X
<2x2 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
>>> print(X)
(0, 1) 0.5
(1, 0) 0.5Curve fitting
Find an equation corresponding a function made from a set of data points.
networkx
Graph attributes
- Adjacency matrix:
nx.adjacency_matrix(G) - Attribute matrix:
nx.attr_matrix(G, <attr-name>)- Eg:
nx.attr_matrix(G, "weight")
- Eg:
Add nodes
- Insert single node: G.addnode()
- Insert nodes from a list: G.addnodesfrom()
Add edges
Adding edges implicitly adds nodes.
Edges from/to on a node in undirected graphs:
G.edges(<node>)Edges incident on a node in directed graphs:
G.in_edges(<node>)Edges outgoing from a node in directed graphs:
G.out_edges(<node>)Check if an edge exists:
DiGraph.has_edge(n1, n2)Add edge with weight:
G.add_edge(from, to, weight=<weight>)Node attributes:
G[<node>]orG.nodes[<node>]Edge attributes:
G[n1][n2]['<attr-name>'] = <attr-val>Get edge data:
G.get_edge_data(n1, n2)
BeautifulSoup4
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
- Install: pip3 install BeautifulSoup4
import pathlib
from bs4 import BeautifulSoup
html_path = pathlib.Path("/home/user/Downloads/input.html")
htmlstr = html_path.read_text()
soup = BeautifulSoup(htmlstr)
content = soup.find(id='content-container')Finding
- Find all elements matching class:
soup.find_all("tagname", class_="classname") - Find by id:
soup.find(id='content-container')
Children
- tag.contents: immediate sub-tags as a list
- tag.children: immediate sub-tags as an iterator
- tag.descendants: all sub-tags as generator
conda
- Create new conda environment:
conda create -n <env-name> [python=3.6.8] [packages needed] - Activate conda environment:
conda activate <env-name> - Install pip to the environment:
conda install pip- conda envs doesn't come with pip by default. Gotta install them.
- Install git:
conda install git - Delete a conda environment:
conda env remove -n <env-name> - Delete a package installed with conda (not with pip):
conda remove <package-name>
More packages
- bidict: bijective maps (ie, 2-way dictionary)
Tools:
- pylint, flake8: linters
- mypy, pytype: static type checkers
- pytest: unit testing
- Show execution time of tests:
pytest durations=0 # show tests that ran for more than 0s
- Show execution time of tests: