Skip to content

Commit ba3c0a2

Browse files
authored
gram schmidt process for orthonormal basis vectors and checking dimensionality (#382)
1 parent 10a7d6b commit ba3c0a2

File tree

7 files changed

+662
-1
lines changed

7 files changed

+662
-1
lines changed

docs/mkdocs.yml

+2
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,8 @@ pages:
9595
- math:
9696
- user_guide/math/num_combinations.md
9797
- user_guide/math/num_permutations.md
98+
- user_guide/math/vectorspace_dimensionality.md
99+
- user_guide/math/vectorspace_orthonormalization.md
98100
- plotting:
99101
- user_guide/plotting/category_scatter.md
100102
- user_guide/plotting/checkerboard_plot.md

docs/sources/CHANGELOG.md

+1
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ The CHANGELOG for the current development version is available at
2121
- The `SequentialFeatureSelector` is now also compatible with Pandas DataFrames and uses DataFrame column-names for more interpretable feature subset reports. ([#379](https://github.com/rasbt/mlxtend/pull/379))
2222
- `ColumnSelector` now works with Pandas DataFrames columns. ([#378](https://github.com/rasbt/mlxtend/pull/378) by [Manuel Garrido](https://github.com/manugarri))
2323
- The `ExhaustiveFeatureSelector` estimator in `mlxtend.feature_selection` now is safely stoppable mid-process by control+c. ([#380](https://github.com/rasbt/mlxtend/pull/380))
24+
- Two new functions, `vectorspace_orthonormalization` and `vectorspace_dimensionality` were added to `mlxtend.math` to use the Gram-Schmidt process to convert an orthogonal vectorspace into orthonormal basis vectors and to compute the dimensionality of a vectorspace, respectively. ([#382](https://github.com/rasbt/mlxtend/pull/382))
2425

2526

2627
##### Changes
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,259 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Vectorspace Dimensionality"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"A function to compute the number of dimensions a set of vectors (arranged as columns in a matrix) spans."
15+
]
16+
},
17+
{
18+
"cell_type": "markdown",
19+
"metadata": {},
20+
"source": [
21+
"> from mlxtend.math import vectorspace_dimensionality"
22+
]
23+
},
24+
{
25+
"cell_type": "markdown",
26+
"metadata": {},
27+
"source": [
28+
"## Overview"
29+
]
30+
},
31+
{
32+
"cell_type": "markdown",
33+
"metadata": {},
34+
"source": [
35+
"Given a set of vectors, arranged as columns in a matrix, the `vectorspace_dimensionality` computes the number of dimensions (i.e., hyper-volume) that the vectorspace spans using the Gram-Schmidt process [1]. In particular, since the Gram-Schmidt process yields vectors that are zero or normalized to 1 (i.e., an orthonormal vectorset if the input was an orthogonal vectorset), the sum of the vector norms corresponds to the number of dimensions of a vectorset. "
36+
]
37+
},
38+
{
39+
"cell_type": "markdown",
40+
"metadata": {},
41+
"source": [
42+
"### References\n",
43+
"\n",
44+
"- [1] https://en.wikipedia.org/wiki/Gram–Schmidt_process"
45+
]
46+
},
47+
{
48+
"cell_type": "markdown",
49+
"metadata": {},
50+
"source": [
51+
"## Example 1 - Compute the dimensions of a vectorspace"
52+
]
53+
},
54+
{
55+
"cell_type": "markdown",
56+
"metadata": {},
57+
"source": [
58+
"Let's assume we have the two basis vectors $x=[1 \\;\\;\\; 0]^T$ and $y=[0\\;\\;\\; 1]^T$ as columns in a matrix. Due to the linear independence of the two vectors, the space that they span is naturally a plane (2D space):"
59+
]
60+
},
61+
{
62+
"cell_type": "code",
63+
"execution_count": 1,
64+
"metadata": {},
65+
"outputs": [
66+
{
67+
"data": {
68+
"text/plain": [
69+
"2"
70+
]
71+
},
72+
"execution_count": 1,
73+
"metadata": {},
74+
"output_type": "execute_result"
75+
}
76+
],
77+
"source": [
78+
"import numpy as np\n",
79+
"from mlxtend.math import vectorspace_dimensionality\n",
80+
"\n",
81+
"\n",
82+
"a = np.array([[1, 0],\n",
83+
" [0, 1]])\n",
84+
"\n",
85+
"vectorspace_dimensionality(a)"
86+
]
87+
},
88+
{
89+
"cell_type": "markdown",
90+
"metadata": {},
91+
"source": [
92+
"However, if one vector is a linear combination of the other, it's intuitive to see that the space the vectorset describes is merely a line, aka a 1D space:"
93+
]
94+
},
95+
{
96+
"cell_type": "code",
97+
"execution_count": 2,
98+
"metadata": {},
99+
"outputs": [
100+
{
101+
"data": {
102+
"text/plain": [
103+
"2"
104+
]
105+
},
106+
"execution_count": 2,
107+
"metadata": {},
108+
"output_type": "execute_result"
109+
}
110+
],
111+
"source": [
112+
"b = np.array([[1, 2],\n",
113+
" [0, 0]])\n",
114+
"\n",
115+
"vectorspace_dimensionality(a)"
116+
]
117+
},
118+
{
119+
"cell_type": "markdown",
120+
"metadata": {},
121+
"source": [
122+
"If 3 vectors are all linearly independent of each other, the dimensionality of the vector space is a volume (i.e., a 3D space):"
123+
]
124+
},
125+
{
126+
"cell_type": "code",
127+
"execution_count": 3,
128+
"metadata": {},
129+
"outputs": [
130+
{
131+
"data": {
132+
"text/plain": [
133+
"3"
134+
]
135+
},
136+
"execution_count": 3,
137+
"metadata": {},
138+
"output_type": "execute_result"
139+
}
140+
],
141+
"source": [
142+
"d = np.array([[1, 9, 1],\n",
143+
" [3, 2, 2],\n",
144+
" [5, 4, 3]])\n",
145+
"\n",
146+
"vectorspace_dimensionality(d)"
147+
]
148+
},
149+
{
150+
"cell_type": "markdown",
151+
"metadata": {},
152+
"source": [
153+
"Again, if a pair of vectors is linearly dependent (here: the 1st and the 2nd row), this reduces the dimensionality by 1:"
154+
]
155+
},
156+
{
157+
"cell_type": "code",
158+
"execution_count": 4,
159+
"metadata": {},
160+
"outputs": [
161+
{
162+
"data": {
163+
"text/plain": [
164+
"2"
165+
]
166+
},
167+
"execution_count": 4,
168+
"metadata": {},
169+
"output_type": "execute_result"
170+
}
171+
],
172+
"source": [
173+
"c = np.array([[1, 2, 1],\n",
174+
" [3, 6, 2],\n",
175+
" [5, 10, 3]])\n",
176+
"\n",
177+
"vectorspace_dimensionality(c)"
178+
]
179+
},
180+
{
181+
"cell_type": "markdown",
182+
"metadata": {},
183+
"source": [
184+
"## API"
185+
]
186+
},
187+
{
188+
"cell_type": "code",
189+
"execution_count": 5,
190+
"metadata": {},
191+
"outputs": [
192+
{
193+
"name": "stdout",
194+
"output_type": "stream",
195+
"text": [
196+
"## vectorspace_dimensionality\n",
197+
"\n",
198+
"*vectorspace_dimensionality(ary)*\n",
199+
"\n",
200+
"Computes the hyper-volume spanned by a vector set\n",
201+
"\n",
202+
"**Parameters**\n",
203+
"\n",
204+
"- `ary` : array-like, shape=[num_vectors, num_vectors]\n",
205+
"\n",
206+
" An orthogonal set of vectors (arranged as columns in a matrix)\n",
207+
"\n",
208+
"**Returns**\n",
209+
"\n",
210+
"- `dimensions` : int\n",
211+
"\n",
212+
" An integer indicating the \"dimensionality\" hyper-volume spanned by\n",
213+
" the vector set\n",
214+
"\n",
215+
"\n"
216+
]
217+
}
218+
],
219+
"source": [
220+
"with open('../../api_modules/mlxtend.math/vectorspace_dimensionality.md', 'r') as f:\n",
221+
" print(f.read())"
222+
]
223+
}
224+
],
225+
"metadata": {
226+
"anaconda-cloud": {},
227+
"kernelspec": {
228+
"display_name": "Python 3",
229+
"language": "python",
230+
"name": "python3"
231+
},
232+
"language_info": {
233+
"codemirror_mode": {
234+
"name": "ipython",
235+
"version": 3
236+
},
237+
"file_extension": ".py",
238+
"mimetype": "text/x-python",
239+
"name": "python",
240+
"nbconvert_exporter": "python",
241+
"pygments_lexer": "ipython3",
242+
"version": "3.6.4"
243+
},
244+
"toc": {
245+
"nav_menu": {},
246+
"number_sections": true,
247+
"sideBar": true,
248+
"skip_h1_title": false,
249+
"title_cell": "Table of Contents",
250+
"title_sidebar": "Contents",
251+
"toc_cell": false,
252+
"toc_position": {},
253+
"toc_section_display": true,
254+
"toc_window_display": false
255+
}
256+
},
257+
"nbformat": 4,
258+
"nbformat_minor": 1
259+
}

0 commit comments

Comments
 (0)